Calculating the variance of the average of B dependent random variable

Question

In the Elements of Statistical Learning book at page 588 the author states that the average of $B$ i.i.d. random variables, each with variance $\sigma^2$, has variance $\frac{1}{B} \sigma^2$.

If the variables are simply i.d. (identically distributed, but not necessarily independent) with positive pairwise correlation $\rho$, the variance of the average is

$$\rho \sigma^2 + \frac{1 - \rho}{B} \sigma^2.$$

I don't understand the calculation of the second case.

The value of $\rho$ can be no less than $-1/(B-1)$ and therefore the variance of the average can never be negative. See https://stats.stackexchange.com/questions/72790/bound-for-the-correlation-of-three-random-variables/72798#72798. — whuber, Dec 22 '17 at 16:35
I have done the calculations my self, and I struggle to see where we use the fact that $\rho\geq0$? — CutePoison, Apr 14 '20 at 12:50
Let me rephrase: I know that $\rho\geq -1/(B-1)$ but that does not imply that $\rho>0$? Say we have $B=3$ we have that $\rho$ can take the values $-0.5$ without violating anything. How come the proof is only for $\rho\geq 0$? — CutePoison, Apr 14 '20 at 13:26

Clarinetist · Accepted Answer · 2017-12-22T16:45:04.750

Let $X_1, \dots, X_B$ be the corresponding random variables, and let $$\bar{X}_B = \dfrac{1}{B}\sum_{i=1}^{B}X_i$$ be their average.

Then

$$\text{Var}(\bar{X}_B) = \dfrac{1}{B^2}\text{Var}\left(\sum_{i=1}^{B}X_i\right) = \dfrac{1}{B^2}\sum_{i=1}^{B}\sum_{j=1}^{B}\text{Cov}(X_i, X_j)$$ Suppose, in the above summation, that $i = j$. Then $\text{Cov}(X_i, X_j) = \sigma^2$. Exactly $B$ of these occur.

Suppose, in the above summation, that $i \neq j$. Then $\text{Cov}(X_i, X_j) = \rho\sigma^2$ since the variances are identical. There are $B^2 - B = B(B-1)$ of these occurrences. (Notice that there are $B * B = B^2$ total terms in the summmation, so $B^2 - B$ is the number of terms that aren't equal to $\sigma^2$, as above.)

Hence, $$\sum_{i=1}^{B}\sum_{j=1}^{B}\text{Cov}(X_i, X_j) = B\sigma^2+B(B-1)\rho\sigma^2$$ from which we obtain $$\text{Var}(\bar{X}_B) = \dfrac{1}{B^2}\left(B\sigma^2+B(B-1)\rho\sigma^2\right) = \dfrac{\sigma^2}{B}+\dfrac{B-1}{B}\rho\sigma^2 = \dfrac{\sigma^2}{B}+\rho\sigma^2-\dfrac{1}{B}\rho\sigma^2$$ or $$\rho\sigma^2 +\dfrac{\sigma^2}{B}(1-\rho)$$ as desired.

Let me rephrase: I know that $\rho\geq -1/(B-1)$ but that does not imply that $\rho>0$? Say we have $B=3$ we have that $\rho$ can take the values $-0.5$ without violating anything. How come the proof is only for $\rho\geq 0$? — CutePoison, Apr 14 '20 at 13:26

Calculating the variance of the average of B dependent random variable

1 Answers1