Total sum of squares(TSS) is not equal ESS +RSS, when the model doesn't include intercept of ones

Question

Why is that? Why $TSS = ESS + RSS$ ,iff only we have an intercept(constant term) in our regression model?Why it doesn't work, when model doesn't include intercept?

See this post. http://stats.stackexchange.com/questions/233256/why-i-am-getting-different-r2-from-r-lm-and-manual-calculation — Haitao Du, Nov 28 '16 at 16:23
@hxd1011 thank you for the comment. But it just provides the fact, that calculation of $R^2$ is different in two cases. I know that. I want to know technically why is it true. — Daniel Yefimov, Nov 28 '16 at 16:26

Taylor · Answer 1 · 2017-12-18T19:38:29.117

To expand on @hxd1011's linked-to answer in the comments, \begin{align*} \text{TSS} &= \sum_i(y_i - \bar{y})^2 \\ &= \sum_{i}(y_i - \hat{y}_i + \hat{y}_i - \bar{y})^2\\ &= \sum_{i}(y_i - \hat{y}_i)^2 + \sum_i (\hat{y}_i - \bar{y})^2 + 2\sum_i(y_i - \hat{y}_i)(\hat{y}_i - \bar{y}) \\ &= \text{ESS} + \text{RSS} + 2\sum_i(y_i - \hat{y}_i)(\hat{y}_i - \bar{y}). \end{align*} @hxd1011 is telling you that sometimes this cross term is $0$, and sometimes it is not.

For simplicity, let's say we only have one predictor. With an intercept, taking the derivatives of $\sum_i(y_i - \hat{y}_i)^2 = \sum_i(y_i - \beta_0 - \beta_1 x_i)^2$ with respect to both $\beta_0$ and $\beta_1$ implies that $\sum_i(y_i - \hat{y}_i) = 0$ and $\sum_i(y_i - \hat{y}_i)x_i =0$. These, taken together show that

\begin{align*} \sum_i(y_i - \hat{y}_i)(\hat{y}_i - \bar{y}) &= \sum_i(y_i - \hat{y}_i)\hat{y}_i - \bar{y} \sum_i(y_i - \hat{y}_i) \\ &= \hat{\beta}_0\sum_i(y_i - \hat{y}_i) + \hat{\beta}_1 \sum_i(y_i - \hat{y}_i)x_i- \bar{y} \sum_i(y_i - \hat{y}_i) \end{align*} is zero. So with an intercept, the cross term cancels out.

However, if you don't have an intercept, you take only one derivative (with respect to $\beta_1$, and setting this equal to $0$ you get $\sum_i(y_i - \beta_1 x_i)x_i = 0$. However, this alone does not tell you that the residuals sum to $0$, and it won't give you the cross term canceling out.

\begin{align*} \sum_i(y_i - \hat{y}_i)(\hat{y}_i - \bar{y}) &= \sum_i(y_i - \hat{y}_i)\hat{y}_i - \bar{y} \sum_i(y_i - \hat{y}_i) \\ &= \hat{\beta}_1 \sum_i(y_i - \hat{y}_i)x_i - \bar{y} \sum_i(y_i - \hat{y}_i) \\ &= - \bar{y} \sum_i(y_i - \hat{y}_i). \end{align*}

score 4 · Answer 2 · answered Feb 27 '18 at 17:03

Here is my deduction, hope it can be helpful.

To start, let's break down the correlation between TSS, ESS, and RSS.

\begin{eqnarray*} TSS&=&\displaystyle\sum_{i}(y_{i}-\overline{y})^2\\ &=&\displaystyle\sum_{i}((y_{i} - \hat{y}_i)+(\hat{y}_i - \overline{y})^2\\ &=&\displaystyle\sum_{i}(y_{i} - \hat{y}_i)^2+\displaystyle\sum_{i}(\hat{y}_i - \overline{y})^2+2\displaystyle\sum_{i}(y_{i} - \hat{y}_i)(\hat{y}_i - \overline{y})\\ &=&ESS+RSS+2\displaystyle\sum_{i}(y_{i} - \hat{y}_i)(\hat{y}_i - \overline{y}) \end{eqnarray*}

We can see that there is a cross-term in the equation.

Given the fact that we are using linear regression model,

\begin{equation} \hat{y}_i = a + bx_i\\ \overline{y} = a + b\overline{x}\\ \hat{y}_i - \overline{y} = b(x_i - \overline{x})\\ y_{i} - \hat{y}_{i} = (y_{i} - \overline{y}_{i}) - (\hat{y}_{i} - \overline{y}_{i}) = (y_{i} - \overline{y}_{i}) - b(x_{i} - \overline{x}_{i}) \end{equation}

Now, we apply above equations into the cross-term

\begin{eqnarray*} 2\displaystyle\sum_{i}(y_{i} - \hat{y}_i)(\hat{y}_i - \overline{y})&=&2b\displaystyle\sum_{i}(y_{i} - \hat{y}_i)(\hat{x}_i - \overline{x})\\ &=&2b\displaystyle\sum_{i}((y_{i} - \overline{y}_{i}) - b(x_{i} - \overline{x}_{i}))(\hat{x}_i - \overline{x})\\ &=&2b(\displaystyle\sum_{i}(y_{i} - \overline{y}_{i})(\hat{x}_i - \overline{x}) - b\displaystyle\sum_{i}(\hat{x}_i - \overline{x})^2) \end{eqnarray*}

Finally, when the model has an intercept, we can get the best value of b by applying least square estimate \begin{equation} \hat{b} = \frac{\displaystyle\sum_{i}(x_{i}-\overline{x})(y_{i}-\overline{y})}{\displaystyle\sum_{i}(x_{i}-\overline{x})^2} \end{equation}

Therefore, \begin{eqnarray*} 2\displaystyle\sum_{i}(y_{i} - \hat{y}_i)(\hat{y}_i - \overline{y})&=&2\hat{b}(\displaystyle\sum_{i}(y_{i} - \overline{y}_{i})(\hat{x}_i - \overline{x}) - \hat{b}\displaystyle\sum_{i}(\hat{x}_i - \overline{x})^2)\\ &=&2\hat{b}(\displaystyle\sum_{i}(y_{i} - \overline{y}_{i})(\hat{x}_i - \overline{x}) - \displaystyle\sum_{i}(y_{i} - \overline{y}_{i})(\hat{x}_i - \overline{x}))\\ &=&0 \end{eqnarray*}

But, when the model does not have an intercept, the best value of b is \begin{equation} \hat{b} = \frac{\displaystyle\sum_{i}(x_{i}y_{i})}{\displaystyle\sum_{i}x_{i}^2} \end{equation}

In this case, we can not make the cross-term be zero.

Total sum of squares(TSS) is not equal ESS +RSS, when the model doesn't include intercept of ones

2 Answers2

Related