2

Consider the general linear regression model: $$y_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \cdots + \beta_px_{ip} + \epsilon_i = \mathbf{x}_i^t \beta + \epsilon_i$$

where $\textbf{x}_i = (1,x_{i1},x_{i2},\cdots,x_{ip})^T$, $\beta=(\beta_0,\beta_1,\cdots,\beta_p)^T$ and $\epsilon_i$ are iid N(0,$\sigma^2$).

I would like to see a complete proof of the following identity from first principles: $$\sum_{i=1}^n(y_i - \bar{y})^2 = \sum_{i=1}^n(\hat{y}_i - \bar{y})^2 + \sum_{i=1}^n(y_i - \hat{y}_i)^2$$ where $\hat{y}_i= \mathbf{x}_i^t \hat{\beta} $ ($\hat{\beta}$ is the least square estimator, $\bar{y}$ ia the sample mean of $y_i$).

I know that the two terms on the right can be obtained by subtracting and adding $\hat{y}_i$ on the left side. But this introduces a "cross term": $$\sum_{i=1}^n2(\hat{y}_i - \bar{y})(y_i - \hat{y}_i)$$

Many texts claim that this is zero, but I have not seen a general proof of this statement. How can this be shown?

Comp_Warrior
  • 2,075
  • 1
  • 20
  • 35

1 Answers1

1

Split like so:

$=\sum_{i=1}^n \hat{y}_i (y_i -\hat{y}_i)-\bar{y} \sum_{i=1}^n (y_i - \hat{y}_i) $

$=\sum_{i=1}^n \hat{y}_i e_i -\bar{y} \sum_{i=1}^n e_i $

(where $e_i$ is the $i$-th residual)

$=\sum_{i=1}^n \hat{y}_i e_i$

Can you do it from there?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • Are $e_i$ the empirical residuals? I thought I understood why they sum to zero but I am not sure now. Can you show me explicitly? As for the remaining term, the only thing I can think of is to substitute $\hat{y}_i=\mathbf{x}_i^T \beta$, and I don't think it leads anywhere. – Comp_Warrior May 04 '14 at 09:13
  • Following the first line to the second, clearly $e_i=y_i -\hat{y}_i$. Those are linear regression residuals (and, um yes, residuals are empirical). The substitution you suggest is not an equality. – Glen_b May 04 '14 at 09:52
  • One approach: you should be able to reduce proving $\hat{y}'e=0$ to proving $X'e=0$. From there, you could just substitute something for $e$ and end up with a difference of two terms that must be equal. – Glen_b May 04 '14 at 10:07