Distribution of the pooled variance in paired samples

Question

Suppose a bivariate normal populations with means $\mu_1$ and $\mu_2$ and equal variance $\sigma^2$ but having a correlation of $\rho$.

Taking a paired sample, it is possible to compute the pooled variance. If $S^2_1$ and $S^2_2$ are the sample variance of the first elements of the pairs and the second elements of the pairs respectively, then, let's note $S_p^2 = \frac{S^2_1+S^2_2}{2}$ the pooled variance (equivalent to the mean of the variances as the samples sizes are the same for the first elements and the second elements).

My question is: how can we demonstrate that the distribution of $S_p^2 / \sigma^2 \approx \chi^2_\nu / \nu$ with $\nu$ the degree of freedom equal to $2(n-1)/(1+\rho^2)$?

If this result is well known, what reference provided the original demonstration?

Ben · Accepted Answer · 2020-08-11T23:04:57.593

I'm not sure about a reference for this result, but it is possible to derive it relatively easily, so I hope that suffices. One way to approach this problem is to look at it as a problem involving a quadratic form taken on a normal random vector. The pooled sample variance can be expressed as a quadratic form of this kind, and these quadratic forms are generally approximated using the chi-squared distribution (with exact correspondence in some cases).

Derivation of the result: In order to show where your assumptions come into the derivation, I will do the first part of the derivation without assuming equal variances for the two groups. If we denote your vectors by $\mathbf{X} = (X_1,...,X_n)$ and $\mathbf{Y} = (Y_1,...,Y_n)$ then your stipulated problem gives the joint normal distribution:

$$\begin{bmatrix} \mathbf{X} \\ \mathbf{Y} \end{bmatrix} \sim \text{N} (\boldsymbol{\mu}, \mathbf{\Sigma} ) \quad \quad \quad \boldsymbol{\mu} = \begin{bmatrix} \mu_X \mathbf{1} \\ \mu_Y \mathbf{1} \end{bmatrix} \quad \quad \quad \mathbf{\Sigma} = \begin{bmatrix} \sigma_X^2 \mathbf{I} & \rho \sigma_X \sigma_Y \mathbf{I} \\ \rho \sigma_X \sigma_Y \mathbf{I} & \sigma_Y^2 \mathbf{I} \end{bmatrix}.$$

Letting $\mathbf{C}$ denote the $n \times n$ centering matrix, you can write the pooled sample variance in this problem as the quadratic form:

$$\begin{align} S_\text{pooled}^2 &= \begin{bmatrix} \mathbf{X} \\ \mathbf{Y} \end{bmatrix}^\text{T} \mathbf{A} \begin{bmatrix} \mathbf{X} \\ \mathbf{Y} \end{bmatrix} \quad \quad \quad \mathbf{A} \equiv \frac{1}{2(n-1)} \begin{bmatrix} \mathbf{C} & \mathbf{0} \\ \mathbf{0} & \mathbf{C} \end{bmatrix}. \\[6pt] \end{align}$$

Now, using standard formulae for the mean and variance of quadradic forms of normal random vectors, and noting that $\mathbf{C}$ is an idempotent matrix (i.e., $\mathbf{C} = \mathbf{C}^2$), you have:

$$\begin{align} \mathbb{E}(S_\text{pooled}^2) &= \text{tr}(\mathbf{A} \mathbf{\Sigma}) + \boldsymbol{\mu}^\text{T} \mathbf{A} \boldsymbol{\mu} \\[6pt] &= \text{tr} \Bigg( \frac{1}{2(n-1)} \begin{bmatrix} \sigma_X^2 \mathbf{C} & \rho \sigma_X \sigma_Y \mathbf{C} \\ \rho \sigma_X \sigma_Y \mathbf{C} & \sigma_Y^2 \mathbf{C} \end{bmatrix} \Bigg) + \mathbf{0} \\[6pt] &= \frac{1}{2(n-1)} \text{tr} \Bigg( \begin{bmatrix} \sigma_X^2 \mathbf{C} & \rho \sigma_X \sigma_Y \mathbf{C} \\ \rho \sigma_X \sigma_Y \mathbf{C} & \sigma_Y^2 \mathbf{C} \end{bmatrix} \Bigg) \\[6pt] &= \frac{1}{2(n-1)} \Bigg[ n \times \frac{n-1}{n} \cdot \sigma_X^2 + n \times \frac{n-1}{n} \cdot \sigma_Y^2 \Bigg] \\[6pt] &= \frac{\sigma_X^2 + \sigma_Y^2}{2}, \\[12pt] \mathbb{V}(S_\text{pooled}^2) &= 2 \text{tr}(\mathbf{A} \mathbf{\Sigma} \mathbf{A} \mathbf{\Sigma}) + 4 \boldsymbol{\mu}^\text{T} \mathbf{A} \mathbf{\Sigma} \mathbf{A} \boldsymbol{\mu} \\[6pt] &= 2 \text{tr} \Bigg( \frac{1}{4(n-1)^2} \begin{bmatrix} \sigma_X^2 \mathbf{C} & \rho \sigma_X \sigma_Y \mathbf{C} \\ \rho \sigma_X \sigma_Y \mathbf{C} & \sigma_Y^2 \mathbf{C} \end{bmatrix}^2 \Bigg) + \mathbf{0} \\[6pt] &= \frac{1}{2(n-1)^2} \text{tr} \Bigg( \begin{bmatrix} (\sigma_X^4 + \rho^2 \sigma_X^2 \sigma_Y^2) \mathbf{C} & (\sigma_X^2 + \sigma_Y^2) \rho \sigma_X \sigma_Y \mathbf{C} \\ (\sigma_X^2 + \sigma_Y^2) \rho \sigma_X \sigma_Y \mathbf{C} & (\sigma_Y^4 + \rho^2 \sigma_X^2 \sigma_Y^2) \mathbf{C} \end{bmatrix} \Bigg) \\[6pt] &= \frac{1}{2(n-1)^2} \Bigg[ n \times \frac{n-1}{n} \cdot (\sigma_X^4 + \rho^2 \sigma_X^2 \sigma_Y^2) + n \times \frac{n-1}{n} \cdot (\sigma_Y^4 + \rho^2 \sigma_X^2 \sigma_Y^2) \Bigg] \\[6pt] &= \frac{1}{2(n-1)} \Bigg[ (\sigma_X^4 + \rho^2 \sigma_X^2 \sigma_Y^2) + (\sigma_Y^4 + \rho^2 \sigma_X^2 \sigma_Y^2) \Bigg] \\[6pt] &= \frac{\sigma_X^4 + \sigma_Y^4 + 2 \rho^2 \sigma_X^2 \sigma_Y^2}{2(n-1)}. \\[12pt] \end{align}$$

Using the equal variance assumption we have $\sigma_X = \sigma_Y = \sigma$ so the moments reduce to:

$$\mathbb{E} \bigg( \frac{S_\text{pooled}^2}{\sigma^2} \bigg) = 1 \quad \quad \quad \mathbb{V} \bigg( \frac{S_\text{pooled}^2}{\sigma^2} \bigg) = \frac{1+\rho^2}{n-1}.$$

It is usual to approximate the distribution of the quadratic form by a scaled chi-squared distribution using the method of moments. Equating the first two moments to that distribution gives the variance requirement $\mathbb{V}(S_\text{pooled}^2/\sigma^2) = 2/\nu$, which then gives the degrees-of-freedom parameter:

$$\nu = \frac{2(n-1)}{1+\rho^2}.$$

Bear in mind that the degrees-of-freedom parameter here depends on the true correlation coefficient $\rho$, and you may need to estimate this using the sample correlation in your actual problem.

Thank you for the clear and well-documented derivation. – Denis Cousineau Aug 11 '20 at 17:33 — Denis Cousineau, Aug 11 '20 at 17:33

Distribution of the pooled variance in paired samples

1 Answers1

Linked