What is wrong with this proof/derivation?

Question

Regarding simple linear regression $y = a + bx + \epsilon$ where $\epsilon$ is uncorrelated, E$[\epsilon]=0$, and Var$[\epsilon]=\sigma^2$, the definition of the residual sum of squares is $SS_{Res}=\Sigma\epsilon^2$ with an expected value of E$[SS_{Res}]=(n-2)\sigma^2$.

Where am I going wrong with the following naive derivation:

E$[SS_{Res}]=$ E$[\Sigma\epsilon_i^2]$

$=\Sigma$E$[\epsilon_i^2]$ since E$[a+b]=$ E$[a]+$E$[b]$

$=\Sigma($E$[\epsilon_i])^2$ since E$[ab]=$ E$[a]$E$[b]$ if Cov$[a,b]=0$

$=n($E$[e_i])^2$

$=0$ since E[ $\epsilon_i$ ] $= 0$ by initial assumption

$E(\epsilon^2)$ is **not** equal to $E(\epsilon)^2$, and the covariance $Cov(a,a)$ is not zero. — , Oct 10 '15 at 07:28
If $a,b$ are uncorrelated, then their correlation coefficient is zero, which is equivalent to Cov$[a,b]=0$. Since $\epsilon$ is assumed to be uncorrelated, Cov$[\epsilon,\epsilon]=0$. On the otherhand, since a vector cannot be orthogonal to itself, I don't see how Cov$[a,a]$, $a \neq 0$ can ever equal zero. Thus I do not understand what Montgomery, Peck and Vining 5e mean when they use the word uncorrelated in their phrase "$\epsilon$ are uncorrelated" when setting up the simple linear regression model. — cwackers, Oct 10 '15 at 16:20
They can say that Cov(${a,a}$) = 0 because ${a}$ is a constant. Note that it is the true intercept in the model, not the estimated intercept from your regression analysis. — AlaskaRon, Oct 10 '15 at 20:58
As you have realised by now you have misunderstood 'uncorrelated'. Can you please provide the actual quote, because "$\epsilon$ is uncorrelated" doesn't make much sense ( any random variable is correlated with itself) - it has to be uncorrelated with another variable ( eg $x$) — seanv507, Oct 10 '15 at 21:59
@AlaskanRon, I'm sorry, the $a$ and $b$ in my comment are not supposed to be the same as those in the regression model. Since I cannot edit my comment at this point, I'll just rewrite it an place it in a new comment. — cwackers, Oct 10 '15 at 22:05
If random variables $v,w$ are uncorrelated, then their correlation coefficient is zero, which is equivalent to Cov$[v,w]=0$. Since ϵ is assumed to be uncorrelated, Cov$[\epsilon,\epsilon]=0$. On the other hand, since a vector cannot be orthogonal to itself, I don't see how Cov$[v,v]$ can ever equal zero for any $v \neq 0$ . Thus I do not understand what Montgomery, Peck and Vining 5e mean by the word "uncorrelated" in their phrase "ϵ are uncorrelated" when setting up the simple linear regression model. — cwackers, Oct 10 '15 at 22:09
@seanv507 From Montgomery, Peck, and Vining, 5th ed, page 12 *"The simple linear regression model is $y=\beta_0+\beta_1x+\epsilon$ ........ $\epsilon$ is a random error component ........ We usually assume that the errors are uncorrelated ........ Furthermore, because the errors are uncorrelated, the responses are uncorrelated."* Later in the book, page 19, they use the uncorrelated nature of $y_i$ to distribute the variance operator across a summation: *"Var$[\Sigma c_i y_i]$ = $\Sigma c_i^2$Var$[y_i]$ because the observations $y_i$ are uncorrelated"* — cwackers, Oct 10 '15 at 22:17
so the point is that you have to have some other variable to define a correlation - eg error at x=1 vs error at x=2 , or error at 1st sample point $\epsilon_1$ vs error at 2nd sample point $\epsilon_2$... so observations $y_i$ are uncorrelated means correlation $y_i,y_j$ =0 unless $i=j$ in which case it is 1 — seanv507, Oct 10 '15 at 22:43
I've been thinking of $y_i$ as a scalar. For example, a data set with $n$ data points $\{x_i,y_i\}$ for $i=1,2,3,...n$. But if I struggle, I can also see $y_i$ as a vector. For example, run an experiment $m$ times with $x$ held fixed at $x_{i=k}$, yielding the data set $\{x_i,y_{i,j}\}$ for $i=k$ and $j=1,2,3,...m$. I'll have to think more about this. — cwackers, Oct 10 '15 at 23:20
The fundamental problem is that you have to very, very distinctly write out the (true) probability model and then the analysis. Mixing analogous terms between them is causing you trouble (along with $E(\epsilon^2) \not= E(\epsilon)^2$. For instance, the model can be written thusly: — AlaskaRon, Oct 11 '15 at 02:46
The fundamental problem is that you have to very, very distinctly write out the (true) probability model and then the analysis. Mixing analogous terms between them is causing you trouble (along with $E(\epsilon^2) \not= E(\epsilon)^2$. For instance, the model can be written thusly: $y_i = a + bx_i+\epsilon_i$ where $\epsilon_1,...,\epsilon_n$ are independent all with variance $\sigma^2$ and mean 0. continued — AlaskaRon, Oct 11 '15 at 02:53
... The analysis is: fit the line $\hat{y_i} = \hat{a} + \hat{b}x_i$ where $\hat{a}$ and $\hat{b}$ are estimators such that we minimize the sum of squared residuals, where a residual is defined $e_i = y_i - \hat{y}_i$. You can't treat things like the $SS_{Res} = \sum e_i^2$ as the same as $\sum \epsilon_i^2$. Incidentally, one thing that show residuals are not uncorrelated with each other is that the sum of residuals in least squared regression HAS to be zero, so increasing one residual will necessarily decrease other residuals. — AlaskaRon, Oct 11 '15 at 02:59
The line 3 of your proof is false! Please check attentionnally here — Yusto, Oct 10 '15 at 12:39

AlaskaRon · Answer 1 · 2015-10-10T20:55:59.590

1

The sum of squared residuals, SSRes, is NOT the sum of the squared epsilons (true errors). The epsilons are the unobserved, independent N(0,$\sigma^2$) random errors. SSRes is the sum of the squared RESIDUALS. Residuals, $e_i = y_i - b_0 - b_1*x_i$ are often used as proxies for the true errors, but they aren't equal to the true errors in the model. For instance, residuals don't have variance $\sigma^2$, in fact, they usually don't even have the same variance, and they aren't independent! So the derivation is wrong from the first line.

edited Oct 10 '15 at 20:55

answered Oct 10 '15 at 07:18

AlaskaRon

2,219
8
12

Thank you for correctly pointing out that I've confused the authors' usages of $\epsilon$ and $e_i$. I need to rethink this starting from line 1, as you have pointed out. For a given sample, I can see that Var$[e_i]\neq \sigma^2$ and that Var$[e_i]$ is not constant with $x$. However I am unable to see how ${e_i}$ are not independent. That quality would seem to be inherited from ${y_i}$, which inherits it from ${\epsilon_i}$, which are assumed to be uncorrelated and iid. Perhaps I do not understand what it means for ${\epsilon_i}$ to be uncorrelated and iid. – cwackers Oct 10 '15 at 17:12
Yes, it is strange that residuals are correlated with each other. You can see this experimentally by plotting a scatterplot, then fit a line to it. If you then raise one of the data values above the line in the Y direction (increasing its residual), the least squares line will move upward also, so that all of the other residuals will change. – AlaskaRon Oct 10 '15 at 21:13
@AlaskanRon, I can visualize the experiment you suggest, but it is not clear to me that the correct interpretation is that experiment is a demonstration that $e_i$ are correlated. – cwackers Oct 10 '15 at 22:28
That experiment certainly shows the $e_i$ are not independent! Since that doesn't necessarily imply nonzero correlation, a (standard) calculation is needed to demonstrate the correlation really is not zero. – whuber Nov 10 '15 at 15:27
Under the usual regression model, where ${\bf X}$ has 1s in the first column and $x_i$ in the 2nd column, and model ${\bf Y = X\beta + \epsilon}$ where $\epsilon$ consists of independent $N(0,\sigma^2)$ errors, the least squared estimator of $\beta$ is ${\bf (X'X)^{-1}X'Y}$. Under these assumptions, the covariance of the residuals is proportional to the identity matrix minus the [hat matrix](https://en.wikipedia.org/wiki/Hat_matrix): $\sigma^2({\bf I - X(X'X)^{-1}X'})$, so the covariances can be directly read off of the off-diagonal elements of this matrix. – AlaskaRon Nov 11 '15 at 05:44

What is wrong with this proof/derivation?

1 Answers1