Simple Linear Regression: how does $\Sigma\hat{u_i}^2/\sigma^2$ follow chi squared distribution with df (n-2)?

Question

My question is, as far as i am aware,
1. the residuals($\hat{u_i}$) are not independent of one another
2. the variance of ith residual is $\sigma\{(1-1/n-(X_i-\overline{X})/\Sigma(X_i-\overline{X})^2\}$

So since each $\hat{u_i}$ isn't independent and their variance does not equal $\sigma^2$, $\hat{u_i}^2/\sigma^2$ can't be a square of i.i.d standard normal variable. Then how can the sum of $\hat{u_i}^2/\sigma^2$ distributed chi squared?

(**I'm not familiar with matrix notations, so I will appriciate it if you could keep it within Simple Regression context)

What is your model? Is it of the form $y=a+bx+\varepsilon$ where the errors $\varepsilon_i$ are iid $N(0,\sigma^2)$? — StubbornAtom, Aug 17 '18 at 07:41
https://stats.stackexchange.com/questions/224042/deriving-sse-of-simple-linear-regression-is-chi2?rq=1, https://stats.stackexchange.com/questions/20227/why-is-rss-distributed-chi-square-times-n-p. — StubbornAtom, Nov 29 '19 at 13:41

StubbornAtom · Accepted Answer · 2020-05-01T13:35:55.127

The correct expression for the variance of the $i$th residual is explained here in detail.

I am using a slightly different notation but in the end it will all match with what you are working with.

Suppose we have a simple linear regression model $$y=\alpha+\beta x+\epsilon$$

where $\alpha+\beta x$ is the part of $y$ explained by $x$ and $\epsilon$ is the unexplained part or the error. Here $y$ is stochastic and $x$ is non-stochastic.

We consider the paired observations $(x_i,y_i)$ for $i=1,2,\ldots,n$ and assume that $\epsilon_i$ are i.i.d $\mathcal N(0,\sigma^2)$ for all $i$. This means we have some $Y_i\sim\mathcal N(\alpha+\beta x_i,\sigma^2)$, independently for all $i$.

Define $$s_{xx}=\sum_{i=1}^n (x_i-\bar x)^2\qquad,\qquad s_{yy}=\sum_{i=1}^n (y_i-\bar y)^2$$ and $$s_{xy}=\sum_{i=1}^n (x_i-\bar x)(y_i-\bar y)$$

From the normal equations, we have the least square estimates of $\alpha$ and $\beta$ :

$$\hat\alpha=\bar y-\hat\beta \bar x\qquad,\qquad\hat\beta=\frac{s_{xy}}{s_{xx}}$$

Let the residual variance be $$s^2=\frac{1}{n-2}\sum_{i=1}^n(y_i-\hat\alpha-\hat\beta x_i)^2$$

On simplification,

\begin{align} (n-2)s^2&=\sum_{i=1}^n (y_i-\hat\alpha-\hat\beta x_i)y_i \\&=\sum_{i=1}^n \left\{(y_i-\bar y)-\hat\beta(x_i-\bar x)\right\}y_i \\&=\sum_{i=1}^n (y_i-\bar y)y_i-\hat\beta \sum_{i=1}^n (x_i-\bar x)y_i \\&=\sum_{i=1}^n (y_i-\bar y)^2-\hat\beta\sum_{i=1}^n (x_i-\bar x)(y_i-\bar y) \\&=s_{yy}-\hat\beta s_{xy} \\&=s_{yy}-\hat\beta^2 s_{xx} \end{align}

Taking $\alpha'=\alpha+\beta\bar x$, joint pdf of $Y=(Y_1,\ldots,Y_n)$ for $(y_1,\ldots,y_n)\in\mathbb R^n$ is

$$f_{Y}(y_1,\ldots,y_n)=\frac{1}{(\sigma\sqrt{2\pi})^n}\exp\left[-\frac{1}{2\sigma^2}\sum_{i=1}^n\left(y_i-\alpha'-\beta(x_i-\bar x)\right)^2\right]$$

Consider the orthogonal transformation $(y_1,\ldots,y_n)\to(z_1,\ldots,z_n)$ such that

$$\begin{pmatrix}z_1\\z_2\\\vdots\\z_n\end{pmatrix}=\mathbf Q\begin{pmatrix}y_1\\y_2\\\vdots\\y_n\end{pmatrix}\,,$$

where $$\mathbf Q=\left[\begin{matrix}\frac{1}{\sqrt n}&\frac{1}{\sqrt n}&\cdots&\frac{1}{\sqrt n}\\\frac{x_1-\bar x}{\sqrt{s_{xx}}}&\frac{x_2-\bar x}{\sqrt{s_{xx}}}&\cdots&\frac{x_n-\bar x}{\sqrt{s_{xx}}}\\\vdots&\vdots&\cdots&\vdots\end{matrix}\right]$$ is an $n\times n$ orthogonal matrix with its first two rows fixed.

Then, $$z_1=\frac{1}{\sqrt n}\sum_{i=1}^n y_i=\sqrt{n}\bar y$$ and $$z_2=\frac{\sum (x_i-\bar x)y_i}{\sqrt{s_{xx}}}=\frac{s_{xy}}{\sqrt{s_{xx}}}=\hat\beta\sqrt{s_{xx}}$$

Note that $\sum\limits_{i=1}^n y_i^2=\sum\limits_{i=1}^n z_i^2$ by virtue of orthogonal transformation, which leads to

\begin{align} \sum_{i=1}^n(y_i-\alpha'-\beta(x_i-\bar x))^2&=\sum_{i=1}^n y_i^2+n\alpha'^2+\beta^2\sum_{i=1}^n(x_i-\bar x)^2-2\alpha'n\bar y-2\beta\sum_{i=1}^n(x_i-\bar x)y_i \\&=\sum_{i=1}^n z_i^2 +n\alpha'^2+\beta^2 s_{xx}-2\alpha'\sqrt n z_1-2\beta z_2\sqrt{s_{xx}} \\&=(z_1-\sqrt n\alpha')^2+(z_2-\beta\sqrt{s_{xx}})^2+\sum_{i=3}^nz_i^2 \end{align}

For $(z_1,\ldots,z_n)\in\mathbb R^n$, joint density of $Z=(Z_1,\ldots,Z_n)$ becomes

$$f_{Z}(z_1,\ldots,z_n)=\frac{1}{(\sigma\sqrt{2\pi})^n}\exp\left[-\frac{1}{2\sigma^2}\left\{(z_1-\sqrt n\alpha')^2+(z_2-\beta\sqrt{s_{xx}})^2+\sum_{i=3}^nz_i^2\right\}\right]\,,$$

so that $Z_1,Z_2,\ldots,Z_n$ are independently distributed with

\begin{align} Z_1&\sim\mathcal N(\sqrt n\alpha',\sigma^2) \\Z_2&\sim\mathcal N(\beta\sqrt{s_{xx}},\sigma^2) \\Z_i&\sim\mathcal N(0,\sigma^2)\qquad,\,i=3,4,\ldots,n \end{align}

Now,

\begin{align} (n-2)s^2&=s_{yy}-\hat\beta^2s_{xx} \\&=\sum_{i=1}^ny_i^2-n\bar y^2-\hat\beta^2s_{xx} \\&=\sum_{i=1}^nz_i^2-z_1^2-z_2^2 \\&=\sum_{i=3}^nz_i^2 \end{align}

And you have that $Z_3,\ldots,Z_n\sim\mathcal N(0,\sigma^2)$, independently.

This implies $$\sum_{i=3}^n\frac{Z_i^2}{\sigma^2}\sim\chi^2_{n-2}$$

Or in other words, $$\frac{(n-2)s^2}{\sigma^2}\sim\chi^2_{n-2}$$

Oh wow...now i see that i should have first asked why the df is actually n-2 not n... Thank you so much! — user8931048, Aug 18 '18 at 09:18
@user8931048 One could actually ask why $s^2$ is defined with $n-2$ instead of $n$. I think it has to do with $s^2$ being unbiased for $\sigma^2$. — StubbornAtom, Aug 18 '18 at 09:36
How was $z_3$ calculated? Its term was not mentioned in the transformation matrix? — Parthiban Rajendran, Nov 25 '18 at 07:22
@PaariVendhan The remaining entries of the matrix $Q$ (only the first two rows are specified) are so chosen that $Q$ becomes orthogonal. There can be several choices of the remaining entries and it does not matter what exactly they are for our purpose. I have omitted parts of the 'simplification' in the proof; if you write it down then it should be clear. — StubbornAtom, Nov 25 '18 at 08:05
Since I am not yet in to matrices, vectors representation of RVs, I am struggling to understand this derivation. I have thus created a related alternate Q [here](https://stats.stackexchange.com/q/378407/202481), where, I tried an alternate simpler path, but stuck. Can you kindly check that? — Parthiban Rajendran, Nov 25 '18 at 08:09
@PaariVendhan Actually this is what I call a proof *without* matrix algebra. The matrix here is just representing the transformation. If you want, you can omit the matrix and directly define $z_1,z_2$ and say $z_3,\ldots,z_n$ are suitable linear combinations of the $y_i$'s such that the transformation becomes orthogonal. And since this result involves multiple random variables, a vector representation is only natural and in itself is not such a big deal. — StubbornAtom, Nov 25 '18 at 08:19
can you please share any detailed reference of this derivation so I could read on, else it would be too many questions to ask here, extending the comments. — Parthiban Rajendran, Nov 25 '18 at 08:34
Hi @StubbornAtom, can you please tell me why did we multiple $\mathbf{y}$ by an orthogonal matrix $\mathbf{Q}$ and not any other matrix that is not necessarily orthogonal? I've read that the norm of a vector is invariant under multiplication by an orthogonal matrix? What is the intuition behind the orthogonal transformation here? Thanks in advance. — Goldman Clarck, May 01 '20 at 08:12
@GoldmanClarck, Yes, orthogonal transformation preserves norm which is being used here: $$\sum_{i=1}^n Z_i^2=Z^TZ=(QY)^T(QY)=Y^TQ^TQY=Y^TY=\sum_{i=1}^n Y_i ^2$$ In addition, if $Y$ has a multivariate normal distribution $N(\mu,\sigma^2 I_n)$, then $QY\sim N(Q\mu,\sigma^2I_n)$ for orthogonal $Q$. So normality is also preserved. — StubbornAtom, May 01 '20 at 13:33
Why $\sum_{i=1}^n (y_i-\hat\alpha-\hat\beta x_i)y_i$ is the same as $\sum_{i=1}^n(y_i-\hat\alpha-\hat\beta x_i)^2$? — Mariana, Sep 05 '21 at 12:43

Simple Linear Regression: how does $\Sigma\hat{u_i}^2/\sigma^2$ follow chi squared distribution with df (n-2)?

1 Answers1

Linked

Related