How do we derive that $S^2$ is chi-squared distributed (with $n-1$ df)?

Question

The claim is that $$(n-1)S^2/\sigma^2$$ is chi squared distributed with degrees of freedom $n-1$.

$(n-1)S^2/\sigma^2$ can be written as $$\sum_i^n \left(\frac {x_i-\mu}{\sigma}\right)^2-\left(\frac {\bar x-\mu}{\sigma/\sqrt n}\right)^2$$

I am almost there with understanding why this is $\chi^2_{n-1}$ distributed. I understand that each of these individual elements is $N(0,1)$ distributed, and that a sum of $n$ $N(0,1)$ distributed variables is $\chi^2_{n}$ distributed.

But my problem is, that the distribution of $\bar x$ is not independent of the $x_i$. How do we take this fact into account to derive the desired conclusion?

Note that the existing answers I've found did not specifically address that question.

EDIT: Note that I am not asking for an explanation of why we write $n-1$ rather than $n$. I am asking specifically how we can rigorously derive that it has the distribution that it has.

EDIT 2: Those who have marked this question as a duplicate of this one may be misunderstanding my question. I am not asking for an explanation of why the degrees of freedom are $n-1$ rather than $n$. I am asking for a derivation that it is chi squared in the first place, and that it has $n-1$ degrees of freedom. My problem is clear from the question: How do we take the dependency with $\bar x$ into account? I'm not asking for an intuitive explanation of why it has $n-1$ df rather than $n$.

It is also covered in numerous X validated questions, like [this one](https://stats.stackexchange.com/q/121662/7224). And follows from the quadratic transform $$x^\text{T}(\mathbf{I}-\frac{1}{n}\mathbf{J})x$$ of the original Normal (standardised) vector $x$. — Xi'an, Nov 12 '17 at 16:18
@Xi'an, I specifically asked this question because the answer to the question you refer to, does not address the specific point that I don't understand namely how to take into account the dependence between $\bar x$ and the $x_i$'s — user56834, Nov 12 '17 at 16:20
It is the sum of squares of independent normal distributions. The dependency is taken into account by the loss of 1 degree of freedom. — Michael R. Chernick, Nov 12 '17 at 18:48
After reading the duplicate I have to agree with you that your question is different. — Michael R. Chernick, Nov 13 '17 at 16:47
@Programmer2134, I understand your frustration. However, please be careful of your tone in your comments & edits. Productive conversations are only possible when our [be nice](https://stats.stackexchange.com/help/be-nice) policy is followed. — gung - Reinstate Monica, Nov 13 '17 at 17:09
The candidate duplicate does not cover the derivation, & while the other linked thread does cover the derivation, it explicitly sidesteps the question here, stating, "(about which, see Cochran's theorem)". Thus, I am reopening this question. — gung - Reinstate Monica, Nov 13 '17 at 17:14
Wasn't this question just answered at https://stats.stackexchange.com/questions/312337/easy-proof-of-sum-i-1n-leftz-i-barz-right2-sim-chi2-n-1/312471#312471? — whuber, Nov 13 '17 at 21:26
@whuber, ah yes, this is closer. I just have one question about your answer which I've left as a comment. — user56834, Nov 14 '17 at 06:15

Boris Burkov · Accepted Answer · 2021-06-29T22:39:08.620

I am mostly reproducing the argument of this excellent wiki post here.

Let $\xi_i \sim \mathcal{N}(\mu, \sigma^2)$ be $n$ independent identically distributed normally distributed random variables.

Denote the sample mean $\bar{\xi} = \frac{\sum \limits_{i=1}^{n} \xi_i}{n}$.

Also denote sample variance $S^2 = \frac{1}{n-1} \sum \limits_{i=1}^{n} (\xi_i - \bar{\xi})^2$.

Suppose that we knew the exact expectation $\mu$. Then let us construct the sum of squares of our samples:

$\sum \limits_{i=1}^n \frac{(\xi_i - \mu)^2}{\sigma^2} \sim \chi^2_n$ (sum of squares of i.i.d. standardized $\xi$ is chi-squared-distributed with n degrees of freedom)

Let us add and subtract the sample mean to the sum of squares:

$\sum \limits_{i=1}^n \frac{(\xi_i - \mu)^2}{\sigma^2} = \sum \limits_{i=1}^n \frac{(\xi_i - \bar{\xi} + \bar{\xi} - \mu)^2}{\sigma^2} = \sum \limits_{i=1}^n (\frac{(\xi_i-\hat{\xi})^2}{\sigma^2} + \underbrace{2 \frac{(\xi_i - \hat{\xi})(\hat{\xi} - \mu)}{\sigma^2}}_{0 \text{ due to }\sum \limits_{i=1}^n (\xi_i - \hat{\xi}) = 0} + \frac{(\hat{\xi} - \mu)^2}{\sigma^2}) = $ $ = (n-1)\frac{S^2}{\sigma^2} + n\frac{(\hat{\xi} - \mu)^2}{\sigma^2}$

$\sum \limits_{i=1}^n \frac{(\xi_i - \mu)^2}{\sigma^2} = (n-1)\frac{S^2}{\sigma^2} + n\frac{(\hat{\xi} - \mu)^2}{\sigma^2}$, where $\sum \limits_{i=1}^n \frac{(\xi_i - \mu)^2}{\sigma^2} \sim \chi^2_n$, $n\frac{(\hat{\xi} - \mu)^2}{\sigma^2} \sim \chi^2_1$.

By Cochran's theorem first term of the sum (random variable that is a funcion of sample variance $S^2$) is independent of the second term (function of sample mean $\hat{\xi}$), thus, probability density function of $\sum \limits_{i=1}^n \frac{(\xi_i - \mu)^2}{\sigma^2}$ is a convolution of probability density functions of $(n-1)\frac{S^2}{\sigma^2}$ and $n\frac{(\hat{\xi} - \mu)^2}{\sigma^2}$.

Now, we can directly use the convolution formula or apply one of spectral analysis tools to it to derive the distribution of $S^2$, moment-generating functions/cumulants or characteristic functions/Fourier transform.

Fourier transform of a convolution of functions is a multiple of their Fourier transforms. Thus, $\phi_{\sum \limits_{i=1}^n \frac{(\xi_i - \mu)^2}{\sigma^2}}(t) = \phi_{(n-1)\frac{S^2}{\sigma^2}}(t) \cdot \phi_{n\frac{(\hat{\xi} - \mu)^2}{\sigma^2}}(t)$.

Characteristic function of a chi-squared distribution is $\phi_{\chi^2_n}(t) = (1-2it)^{-\frac{n}{2}}$.

Thus, characteristic function $\phi_{(n-1)\frac{S^2}{\sigma^2}}(t) = (1-2it)^{\frac{-n}{2}} \cdot (1-2it)^{\frac{1}{2}} = (1-2it)^{-\frac{n-1}{2}}$. But this is the characteristic function of $\chi^2_{n-1}$ (characteristic functions are mostly reversible, so that correspondence of characteristic functions implies correspondence of distributions).

Hence, $(n-1)\frac{S^2}{\sigma^2} \sim \chi^2_{n-1}$.

How do we derive that $S^2$ is chi-squared distributed (with $n-1$ df)?

1 Answers1

Related