6

I am trying to derive the covariance of two sample means and get confused at one point. Given is a sample of size $n$ with paired dependent observations $x_i$ and $y_i$ as realizations of RVs $X$ and $Y$ and sample means $\bar{x}$ and $\bar{y}$. I try to derive $cov(\bar{x},\bar{y})$.

I am relatively sure the result should be

$cov(\bar{x},\bar{y})=\frac{1}{n}cov(X,Y)$

However I arrive at

$$cov(\bar{x},\bar{y})=E(\bar{x}\bar{y})-\mu_x\mu_y = E(\frac{1}{n^2}\sum x_i \sum y_i) -\mu_x\mu_y =\frac{1}{n^2} n^2 E(x_i y_i) -\mu_x\mu_y=cov(X,Y)$$

I used

$$E(\frac{1}{n^2}\sum x_i \sum y_i)=\frac{1}{n^2} E(x_1y_1+x_2y_1+...x_ny_n)=\frac{1}{n^2} n^2 E(x_iy_i)$$

Somewhere should be a flaw in my thinking.

tomka
  • 5,874
  • 3
  • 30
  • 71
  • 1
    I think your reasoning is essentially correct: http://stats.stackexchange.com/questions/59546/estimating-the-covariance-of-the-means-from-two-samples, that is, $\mathrm{cov}(\bar{x},\bar{y}) = \mathrm{cov}(X,Y)$ – sandris Jul 28 '15 at 16:53
  • So the difference is the assumption about covariances in paired and independent samples. The upper result is that for paired samples, the lower that for independent samples, where $E(x_iy_j)=E(x_i)E(y_j)$ when $i \ne j$ – tomka Jul 28 '15 at 17:00
  • 5
    If you are comfortable with deriving the fact that the variance of the sample mean is $1/n$ times the variance, then the result is immediate because [covariances are variances](http://stats.stackexchange.com/a/142472). As far as your mistake goes, note that $\text{cov}(x_i,y_j)=0$ for $i\ne j$. It also helps to know that whenever you are working with covariances or variances you may always assume the means are zero, because these are *central* moments that don't depend on the means at all. – whuber Jul 28 '15 at 17:31
  • What I do not yet fully understand is why it holds that $cov(x_i,y_j)=0$ for $i≠j$ when I have paired samples, but it does not hold when I have independent samples (?). Can you explain? – tomka Jul 28 '15 at 19:45
  • 3
    Your use of the term "sample" implicitly means $(x_i,y_i)$ is independent of $(x_j,y_j)$ for $i\ne j$. From this it is immediate that their covariances (if they exist) must be zero. – whuber Jul 28 '15 at 20:44

2 Answers2

8

Covariance is a bilinear function meaning that $$ \operatorname{cov}\left(\sum_{i=1}^n a_iC_i, \sum_{j=1}^m b_jD_j\right) = \sum_{i=1}^n \sum_{j=1}^m a_i b_j\operatorname{cov}(C_i,D_j).$$ There is no need to mess with means etc.

Applying this to the question of the covariance of the sample means of $n$ independent paired samples $(X_i, Y_i)$ (note: the pairs are independent bivariate random variables; we are not claiming that $X_i$ and $Y_i$ are independent random variables), we have that \begin{align} \operatorname{cov}\left(\bar{X},\bar{Y}\right) &= \operatorname{cov}\left(\frac{1}{n}\sum_{i=1}^n X_i, \frac 1n\sum_{j=1}^n Y_j\right)\\ &= \frac{1}{n^2}\sum_{i=1}^n \sum_{j=1}^n \operatorname{cov} (X_i, Y_j)\\ &= \frac{1}{n^2}\sum_{i=1}^n \operatorname{cov} (X_i, Y_i) &\scriptstyle{\text{since $X_i$ and $Y_j$ are independent, and thus uncorrelated, for $i \neq j$}}\\ &= \frac 1n\operatorname{cov} (X, Y) \end{align}

Dilip Sarwate
  • 41,202
  • 4
  • 94
  • 200
  • I think they are $n^2$ terms, but $n(n-1)$ cancle with $\mu_x\mu_y$ due to independence. – tomka Jul 28 '15 at 18:56
3

I think the algebra issue is resolved with the following:

${1 \over n^2}E(\sum_{i=1}^n x_i \sum_{i=1}^n y_i)={1 \over n^2}E(\sum_{i=1}^n x_i y_i +\sum_{i\ne j}x_i y_j)$

$={1 \over n^2}(n(Cov(x_i,y_i)+\mu_X \mu_Y)+n(n-1)\mu_X \mu_Y)$

$={1 \over n^2}(n Cov(x_i,y_i)+n^2 \mu_X \mu_Y))=Cov(x_i,y_i)/n+ \mu_X \mu_Y$

whuber
  • 281,159
  • 54
  • 637
  • 1,101
JimB
  • 2,043
  • 8
  • 14