Estimating the covariance of the means from two samples?

Question

Let there be two samples of size $n$, $x_i$ and $y_i$ from two different normal distributions.

What is $\operatorname{cov}(\bar X_n, \bar Y_n)$? And how can it be estimated?

The motivation for my question is to understand if there is a way to know if two paired samples are correlated in such away so that their expectancies "should" be compared used paired t-test.

Thanks.

Do you mean you have *paired* data $(x_i,y_i)$ and you would like to determine whether to use a paired test or unpaired test to compare $\overline{x}$ to $\overline{y}$? If you have no such pairing, how do you propose to make any sense of a covariance in the first place? My concern is that if you base a preliminary decision to use a paired test or not by inspecting these data, then you will change the size and power of the test you do choose (in a complicated way), thereby invalidating any p-value it produces. — whuber, May 20 '13 at 18:45
Hi Whuber. First - yes, the data is paired. Second - I agree with you that the post-test t-test is not an easy thing to understand. Feel free to assume that the two tests are done on two different samples. — Tal Galili, May 20 '13 at 18:56
Isn't your course of action clear, then? If the first set of data has a positive covariance, use a paired t-test for the second set; otherwise use an unpaired t-test. I believe this procedure has greater average power than any other (conditional on observing the first set and selecting the form of t-test before observing the second set). — whuber, May 20 '13 at 19:30
Hi Whuber. I agree with you. However, I'm trying to understand what is the relation between the cov of my observations to that of their averages (since, if I understand correctly, the gain in the paired test is due to having: $var(\bar x - \bar y)=var(\bar x)+var(\bar y)-2*s_x*s_y*cov(\bar x , \bar y)$ (yet I don't know how the last piece works) — Tal Galili, May 20 '13 at 20:08

Glen_b · Accepted Answer · 2021-09-12T01:43:17.387

\begin{eqnarray} \text{cov}(\bar X_n, \bar Y_n) &=& \text{cov}(1/n \sum X_i, 1/n \sum Y_j)\\ &=& 1/n^2 \cdot \text{cov}( \sum X_i, \sum Y_j)\\ &=& 1/n^2 \cdot \sum_i \sum_j \text{cov}( X_i, Y_j) \end{eqnarray}

To go further, we need to specify something about the covariances. If the samples are iid random samples where $\text{cov}(X_i,Y_j)$ is constant over all $i,j$:

\begin{eqnarray} \quad\quad &=& 1/n^2 \cdot n^2 \text{cov}( X, Y)\\ \quad\quad &=& \text{cov}( X, Y)\, . \end{eqnarray}

If instead (and as seems to be the case here) we're talking about paired data, where $X_i$ and $Y_j$ are only correlated when $i=j$ then:

\begin{eqnarray} \quad\quad &=& 1/n^2 \cdot \sum_i \sum_j \text{cov}( X_i, Y_j)\\ \quad\quad &=& 1/n^2 \cdot n \cdot \text{cov}( X_i, Y_i)\\ \quad\quad &=& 1/n \cdot \text{cov}( X_i, Y_i)\\ \quad\quad &=& 1/n \cdot \rho\, \sigma_x \sigma_y, \end{eqnarray}

where $\rho$ is the correlation between $X$ and $Y$ pairs.

wolfies · Answer 2 · 2017-05-30T08:14:02.623

Here is an answer derived using the theory of 'moments of moments', using power sum notation, and leaving the grunt work to mathStatica. In particular, in power sum notation, let:

$$s_{a,b}=\sum _{i=1}^n X_i^a Y_i^b$$

Then, $\operatorname{cov}(\bar X_n, \bar Y_n)$ = $\operatorname{cov}(\frac{s_{1,0}}{n}$, $\frac{s_{0,1}}{n}$) ... and since the covariance operator is just the {1,1} CentralMoment, the solution is:

where $\mu_{1,1}$ denotes the {1,1} central moment of the population ...

i.e. The solution is:

$$\operatorname{cov}(\bar X_n, \bar Y_n) = \frac{\operatorname{cov}(X, Y)}{n} $$

In the case of independence, $\operatorname{cov}(X,Y)$ is, of course, zero.

Estimating the covariance of the means from two samples?

2 Answers2

Linked