4

I'm working on this identity $$\sum_{i-1}^n (y_i - \hat {\beta_0} - \hat {\beta_1}x_i)^2 = \sum_{i=1}^n y_i^2 - \hat {\beta_0}\sum_{i=1}^n y_i - \hat {\beta_1} \sum_{i=1}^n x_iy_i$$

I have these relationships to work with:

$$\hat {\beta_1} = \frac { n\sum_{i_1}^n x_iy_i - \left ( \sum_{i=1}^n x_i \right ) \left ( \sum_{i=1}^n y_i \right )}{ n \left ( \sum_{i=1}^n x_i^2 \right ) -\left (\sum_{i=1}^n x_i \right )^2 }$$

$$ \hat {\beta_0}= \overline {y} - \hat {\beta_1} \overline {x}$$

A little manipulation also shows

$$\hat {\beta_1}= \frac { \sum_{i=1}^n ( x_i - \overline {x}) y_i}{ \left ( \sum_{i=1}^n x_i^2 \right ) -n \overline {x}^2 }$$

My strategy has to substitute $\hat {\beta_1}$ out of the equation, but I just keep getting $\overline {y}$ in too many terms and not any terms with $y_i$. The formula for $\hat {\beta_1}$ is too complicated to consider working with. What am I missing?

user92612
  • 695
  • 1
  • 5
  • 10
  • 1
    By selecting suitable units in which to measure the $x_i$ and the $y_i$ you can make the $x_i$ sum to $0$, the $y_i$ sum to $0$, the $x_i^2$ sum to $n$, and the $y_i^2$ sum to $n$. The formulas for $\hat\beta_0$ and $\hat\beta_1$ will greatly simplify with no loss in generality. – whuber May 14 '14 at 20:47

3 Answers3

9

You are trying to reprove the Pythagorean Theorem. Understanding the connection provides a powerful intuition for understanding ordinary least squares regression and (incidentally) makes short work of the proof.


Let $y$ represent the vector $(y_1, y_2, \ldots, y_n)$, $\mathbf{1}$ the $n$-vector $(1,1,\ldots,1)$, and $x$ the vector $(x_1,x_2, \ldots, x_n)$. Denote by $\hat{y}$ the vector $\hat\beta_0\mathbf{1} + \hat\beta_1 x.$ In this notation the identity (after combining the last two sums) is

$$ ||y-\hat y||^2 = \sum_{i-1}^n (y_i - \hat {\beta_0} - \hat {\beta_1}x_i)^2 = \sum_{i=1}^n y_i^2 - \sum_{i=1}^n\left(\hat {\beta_0} - \hat {\beta_1} x_i\right)y_i = ||y||^2 - \hat y \cdot y.$$

Because $\hat\beta_0$ and $\hat\beta_1$ are formulas for the least-squares coefficients, by definition $y-\hat{y}$ minimizes the squared distance from $y$ to the line generated by $\hat{y}$. Because $y$ and $\hat{y}$ span a space of at most two dimensions, understanding their relationship is a matter of planar Euclidean geometry which is faithfully illustrated with a simple diagram:

Figure

The formulae for $\hat\beta_0$ and $\hat\beta_1$ are usually derived by demonstrating the geometrically obvious fact that $y-\hat y$ must be perpendicular to $\hat y$. In terms of vector operations, this means their dot product is zero:

$$\hat y \cdot (y - \hat y) = 0.$$

Expanding this dot product shows it is equivalent to the key relationship

$$||\hat y||^2 = \hat y \cdot \hat y = \hat y \cdot y.$$

Consider the Pythagorean Theorem. In this right triangle it asserts that the square of one leg equals the square of the hypotenuse minus the square of the other leg:

$$||y-\hat y||^2 = ||y||^2 - ||\hat y||^2.$$

The key relationship provides another expression for the last term, yielding

$$||y-\hat y||^2 = ||y||^2 - \hat y \cdot y,$$

which is the desired identity, QED.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • I having trouble following this derivation. How do you know $y$ and $\hat{y}$ span only 2 dims? How do you know that $\hat{y}$ and $y - \hat{y}$ are perpendicular/dot product=0? – Mitch Jan 04 '17 at 21:42
  • 1
    @Mitch (1) Two vectors and their common origin are three points. Assuming they are distinct, Euclid states they determine a plane. The graphic here is fully general; the only case it does not accurately depict is when $y$ and $\hat y$ are collinear: they span just *one* (or zero) dimension. (2) The perpendicularity follows from the least squares criterion: the squared residual length is minimized when the dot product is zero. This is geometrically clear: if the angle were not ninety degrees, you could shorten the distance--and thereby reduce the squared distance--by changing the projection. – whuber Jan 04 '17 at 21:47
  • 1
    @ayorgo I don't think so, because "$\hat y$" on the right hand side is undefined. – whuber Aug 17 '19 at 13:56
4

OK, I will do some parts and leave the rest for you to do it yourself. I dropped the index of summations for simplicity. Start from expanding the L.H.S to have $$L.H.S=\sum y_i^2+\sum(\hat{\beta_0}+\hat{\beta_1}x_i)^2-2\hat{\beta_0}\sum y_i-2\hat{\beta_1}\sum (y_ix_i)$$ which is $$L.H.S=\Big[\sum y_i^2-\hat{\beta_0}\sum y_i-\hat{\beta_1}\sum (y_ix_i)\Big]+\sum(\hat{\beta_0}+\hat{\beta_1}x_i)^2-\hat{\beta_0}\sum y_i-\hat{\beta_1}.\sum (y_ix_i).$$ Now what we have inside the bracket is actually the R.H.S. So you need to show that the rest is zero i.e. $$\sum(\hat{\beta_0}+\hat{\beta_1}x_i)^2-\hat{\beta_0}\sum y_i-\hat{\beta_1}\sum (y_ix_i)=0.$$ Now to show this, I will give you some hints. You need to do them correctly and step by step.

  1. Replace $\hat{\beta_0}$ by $\bar{y}-\hat{\beta_1}\bar{x}$ to re-write it all based on $\hat{\beta_1}$.
  2. Expand the terms and simplify (some terms will be canceled out).
  3. Now use two facts:
    (1): $\hat{\beta_1}=\dfrac{S_{xy}}{S_{xx}},$ where $S_{xy}=\sum(x_i-\bar{x})(y_i-\bar{y})$ and $S_{xx}=\sum(x_i-\bar{x})^2$ and
    (2): $S_{xy}=\sum (x_iy_i)-\dfrac{\sum x_i .\sum y_i}{n}$
    to write everything in terms of $S_{xy}$ and $S_{xx}$.
  4. Simplify to show that it is zero.
Stat
  • 7,078
  • 1
  • 24
  • 49
0

Alternatively, one can show that $$\sum(\hat{\beta_0}+\hat{\beta_1}x_i)^2-\hat{\beta_0}\sum y_i-\hat{\beta_1}\sum y_ix_i=0.$$ by regrouping it to $$\hat{\beta_0} \sum(y_i-\hat{\beta_0}-\hat{\beta_1}x_i)+\hat{\beta_1}\sum (y_i-\hat{\beta_0}-\hat{\beta_1}x_i)x_i=0.$$ and invoking the minimisation argument $$0=\frac{\partial}{\partial{\hat{\beta_0}}}\sum(y_i-\hat{\beta_0}-\hat{\beta_1}x_i)^2=2\sum(y_i-\hat{\beta_0}-\hat{\beta_1}x_i)$$ $$0=\frac{\partial}{\partial{\hat{\beta_1}}}\sum(y_i-\hat{\beta_0}-\hat{\beta_1}x_i)^2=2\sum(y_i-\hat{\beta_0}-\hat{\beta_1}x_i)x_i$$

ayorgo
  • 241
  • 4
  • 10