Variance of mean of correlated variables

Question

Page 228 of THIS BOOK provides the formula for the variance of the mean of more than two correlated random variables:

where $m$ is the number of variables, $r$ is the correlation between the variables, and $V$ is the variance of each of the variables.

The same book, however, provides a different formula for the variance of the mean of two correlated variables:

The formula formula for more than two variables doesn't seem to be an extension of the formula for two variables. Specifically a $2$ is in the two-variable formula that is absent in the more-than-two-variable formula.

Is this by design?

0.1*sqrt(0.25*0.5)+0.2*sqrt(0.25*0.75)+0.3*sqrt(0.5*0.75) – user158565 Aug 05 '19 at 01:50 — user158565, Aug 05 '19 at 01:50

Joe · Accepted Answer · 2019-08-05T03:05:49.743

2

The formula for $m>2$ is a generalization of the other formula:

When $m=2$: $$ \left(\frac{1}{m}\right)^2 = \frac{1}{4}, $$

The sum of $V_i$ equals $V_1 + V_2$,

And for the last summation,
$$ r_{12} \cdot \sqrt{V_1} \cdot \sqrt{V_2} + r_{21} \cdot \sqrt{V_2} \cdot \sqrt{V_1} = 2r \cdot \sqrt{V_1} \cdot \sqrt{V_2} $$

Here's an R code for computing this sum:

myVariances <- c(0.25,0.5,0.75) # this is a vector of the variances

myCorrelations <- matrix(data = c(1,0.1,0.2,0.1,1,0.3,0.2,0.3,1), nrow = 3, ncol = 3) # this is the matrix of correlations

mySum <- 0 # initializes mySum to zero

for (i in 1:nrow(myCorrelations)) {
  for (j in 1:nrow(myCorrelations)) {
    mySum <- mySum + myCorrelations[i,j] * sqrt(myVariances[[i]]) * sqrt(myVariances[[j]])
  }
} # this loop computes the sum

(1/nrow(myCorrelations))^2 * mySum # this multiplies that sum by (1/m)^2

The above code assumes that your matrix of correlations includes 1's on the diagonal, to represent that the variables are perfectly correlated with themselves.

edited Aug 05 '19 at 03:05

answered Aug 05 '19 at 02:32

Joe

191
6

That’s close, but when the sum says for all i not equal to j, it’s saying that you have, for example, to let i=1 and j=2 AND also let i=2 and j=1. Since those terms will be equal, you could just compute one of them and multiply by two. – Joe Aug 05 '19 at 02:40
For your example of 3 variables? Well, your code is close, it just needs three 2’s – Joe Aug 05 '19 at 02:44
((1/3)^2)*(sum(var1, var2, var3) + 2*r12*sqrt(var1*var2)+2*r13*sqrt(var1*var3)+2*r23*sqrt(var2*var3)) – Joe Aug 05 '19 at 02:46
Ah, ok. Maybe I can. How are those variables be stored in your environment? Like, do you have a vector of variances and a matrix of correlations? – Joe Aug 05 '19 at 02:49
1

Welcome to Stats.SE and thank you for your answer. Take the opportunity to take the [tour](https://stats.stackexchange.com/tour), if you haven't done it already. See also some tips on [formatting help](https://stats.stackexchange.com/help/formatting) and on writing down equations using [LaTeX / MathJax](https://math.meta.stackexchange.com/q/5020). – Ertxiem - reinstate Monica Aug 05 '19 at 02:51
@ Ertxiem, thank you. I had no idea how other posts were able to make such nice looking formulas! – Joe Aug 05 '19 at 03:07
I had a small, but significant error in the code when I first typed it. I forgot to add the sum terms to mySum before assigning them back to mySum. I fixed that. – Joe Aug 05 '19 at 03:10
Yes, if all of the r_ij are equal, that term can be factored out of the sum. That would simplify the equation, and make things easier if you were computing it by hand, but as far as the code, it's already only a few lines. – Joe Aug 05 '19 at 03:16

Variance of mean of correlated variables

1 Answers1

Linked