6

Let there be two samples of size $n$, $x_i$ and $y_i$ from two different normal distributions.

What is $\operatorname{cov}(\bar X_n, \bar Y_n)$? And how can it be estimated?

The motivation for my question is to understand if there is a way to know if two paired samples are correlated in such away so that their expectancies "should" be compared used paired t-test.

Thanks.

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
Tal Galili
  • 19,935
  • 32
  • 133
  • 195
  • Do you mean you have *paired* data $(x_i,y_i)$ and you would like to determine whether to use a paired test or unpaired test to compare $\overline{x}$ to $\overline{y}$? If you have no such pairing, how do you propose to make any sense of a covariance in the first place? My concern is that if you base a preliminary decision to use a paired test or not by inspecting these data, then you will change the size and power of the test you do choose (in a complicated way), thereby invalidating any p-value it produces. – whuber May 20 '13 at 18:45
  • Hi Whuber. First - yes, the data is paired. Second - I agree with you that the post-test t-test is not an easy thing to understand. Feel free to assume that the two tests are done on two different samples. – Tal Galili May 20 '13 at 18:56
  • Isn't your course of action clear, then? If the first set of data has a positive covariance, use a paired t-test for the second set; otherwise use an unpaired t-test. I believe this procedure has greater average power than any other (conditional on observing the first set and selecting the form of t-test before observing the second set). – whuber May 20 '13 at 19:30
  • Hi Whuber. I agree with you. However, I'm trying to understand what is the relation between the cov of my observations to that of their averages (since, if I understand correctly, the gain in the paired test is due to having: $var(\bar x - \bar y)=var(\bar x)+var(\bar y)-2*s_x*s_y*cov(\bar x , \bar y)$ (yet I don't know how the last piece works) – Tal Galili May 20 '13 at 20:08

2 Answers2

9

\begin{eqnarray} \text{cov}(\bar X_n, \bar Y_n) &=& \text{cov}(1/n \sum X_i, 1/n \sum Y_j)\\ &=& 1/n^2 \cdot \text{cov}( \sum X_i, \sum Y_j)\\ &=& 1/n^2 \cdot \sum_i \sum_j \text{cov}( X_i, Y_j) \end{eqnarray}

To go further, we need to specify something about the covariances. If the samples are iid random samples where $\text{cov}(X_i,Y_j)$ is constant over all $i,j$:

\begin{eqnarray} \quad\quad &=& 1/n^2 \cdot n^2 \text{cov}( X, Y)\\ \quad\quad &=& \text{cov}( X, Y)\, . \end{eqnarray}

If instead (and as seems to be the case here) we're talking about paired data, where $X_i$ and $Y_j$ are only correlated when $i=j$ then:

\begin{eqnarray} \quad\quad &=& 1/n^2 \cdot \sum_i \sum_j \text{cov}( X_i, Y_j)\\ \quad\quad &=& 1/n^2 \cdot n \cdot \text{cov}( X_i, Y_i)\\ \quad\quad &=& 1/n \cdot \text{cov}( X_i, Y_i)\\ \quad\quad &=& 1/n \cdot \rho\, \sigma_x \sigma_y, \end{eqnarray}

where $\rho$ is the correlation between $X$ and $Y$ pairs.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
2

Here is an answer derived using the theory of 'moments of moments', using power sum notation, and leaving the grunt work to mathStatica. In particular, in power sum notation, let:

$$s_{a,b}=\sum _{i=1}^n X_i^a Y_i^b$$

Then, $\operatorname{cov}(\bar X_n, \bar Y_n)$ = $\operatorname{cov}(\frac{s_{1,0}}{n}$, $\frac{s_{0,1}}{n}$) ... and since the covariance operator is just the {1,1} CentralMoment, the solution is:

enter image description here

where $\mu_{1,1}$ denotes the {1,1} central moment of the population ...

i.e. The solution is:

$$\operatorname{cov}(\bar X_n, \bar Y_n) = \frac{\operatorname{cov}(X, Y)}{n} $$

In the case of independence, $\operatorname{cov}(X,Y)$ is, of course, zero.

wolfies
  • 6,963
  • 1
  • 22
  • 27