Confused with Residual Sum of Squares and Total Sum of Squares

Question

From Wikipedia: https://en.wikipedia.org/wiki/Residual_sum_of_squares, the RSS is the average squared error between true value $y$, and the predicted value $\hat y$.

Then according to: https://en.wikipedia.org/wiki/Total_sum_of_squares, the TSS is the squared error between the true value $y$, and the average of all $y$.

However, I don't understand this line under the explanation for TSS:

[...] the total sum of squares equals the explained sum of squares plus the residual sum of squares.

If we plot RSS on the graph, it would look like:

TSS Plot:

ESS Plot:

According to the images, the residual (unexplained) value is actually larger than the TSS. Is there something I'm not following?

That's why the summation is important. As to why you can partition it see http://stats.stackexchange.com/questions/258284/linear-regression-why-can-you-partition-sums-of-squares/258308#258308 — Łukasz Grad, Mar 06 '17 at 23:20
@ŁukaszGrad, I looked at the post and unfortunately I don't really get it. Can you elaborate a bit further? Thanks — user1157751, Mar 06 '17 at 23:24
This may also help you: http://stats.stackexchange.com/questions/256726/linear-regression-what-does-the-f-statistic-r-squared-and-residual-standard-err/256821#256821 and this perhaps too: http://stats.stackexchange.com/questions/255973/why-do-the-anova-assumptions-equality-of-variance-normality-of-residuals-matt/256104#256104 And maybe this one: http://stats.stackexchange.com/questions/256344/why-is-correlation-not-very-useful-when-one-of-the-variables-is-categorical/256380#256380 — Stefan, Mar 06 '17 at 23:25

score 10 · Accepted Answer · answered Mar 07 '17 at 00:05

10

You have the total sum of squares being $\displaystyle \sum_i ({y}_i-\bar{y})^2$

which you can write as $\displaystyle \sum_i ({y}_i-\hat{y}_i+\hat{y}_i-\bar{y})^2 $

i.e. as $\displaystyle \sum_i ({y}_i-\hat{y}_i)^2+2\sum_i ({y}_i-\hat{y}_i)(\hat{y}_i-\bar{y}) +\sum_i(\hat{y}_i-\bar{y})^2$ where

the first summation term is the residual sum of squares,
the second is zero (if not then there is correlation, suggesting there are better values of $\hat{y}_i$) and
the third is the explained sum of squares

Since you have sums of squares, they must be non-negative and so the residual sum of squares must be less than the total sum of squares

answered Mar 07 '17 at 00:05

Henry

Hi, thanks for your answer. It may be a bit too late, but can you elaborate on how the 2nd term means mathematically? Looking at the equation, I don't think I can come up with the conclusion that it needs to be 0. Thanks again! – user1157751 Mar 09 '17 at 07:50
Can I say this for the second term: For linear regression, the amount we over estimate and under estimate is zero, so when we sum $y_i$-$\hat y_i$, we will get 0? – user1157751 Mar 09 '17 at 18:29
Yes you should get $0$ since the average of each should be $\bar{y}$ and so they have the same sums. But my point is subtly different: the residuals should be uncorrelated with the predicted values as a direct result of the linear regression – Henry Mar 09 '17 at 18:35
Thanks for your reply again! I don't see a connection between "the residuals should be uncorrelated with the predicted values as a direct result of the linear regression" with any of the terms (1, 2, and 3). Can you give me some hints? Thanks – user1157751 Mar 09 '17 at 18:44
2

$\sum_i ({y}_i-\hat{y}_i)(\hat{y}_i-\bar{y})$ is the covariance of the residuals and the predicted values. Try an ordinary least squares regression and see that it is zero – Henry Mar 09 '17 at 21:15
Do you mean $cov((y_i- \hat y_i), \hat y_i)$? Thanks again, I'm trying to work out the math details to fully understand what is going on. – user1157751 Mar 09 '17 at 21:43

Learning stats by example · Answer 2 · 2018-08-04T16:22:32.827

4

This chart was very helpful for me.

edited Aug 04 '18 at 16:22

answered Jun 20 '18 at 12:24

2 Answers2