5

I have that $R^{2} = 1 - \frac{\text{RSS}}{\sum_{i=1}^{n}(Y_{i}-\bar{Y})^{2}}$. Also, $\text{RSS}= {\sum_{i=1}^{n}(Y_{i}-\hat{Y_{i}})^{2}}$ for the simplest linear model with only the intercept term. I also know that $\frac{1}{n}\sum_{i=1}^{n}(Y_{i}-\bar{Y})^{2}$ is the total variance for the intercept only model and that $\frac{\text{RSS}}{\frac{1}{n}{\sum_{i=1}^{n}(Y_{i}-\bar{Y})^{2}}}$ is approximately $\frac{\text{var. of model}}{\text{variance}}$.

However I still don't get why $R^{2}$ is the proportion of total variance of the data explained by the model.

python_learner
  • 579
  • 5
  • 14
  • Have a look at my answer here (especially to Question 2): http://stats.stackexchange.com/questions/256726/linear-regression-what-does-the-f-statistic-r-squared-and-residual-standard-err/256821#256821 Does that help? – Stefan Feb 08 '17 at 19:35

2 Answers2

7

There is an error in your equations, $RSS = \sum(Y_i - \hat{Y}_i)^2$

Maybe it would help not looking at so many equations to understand.

RSS is the sum of the residual variance, basically the sum of all the variance that the model can't explain.

Therefore $\frac{RSS}{\sum{(Y_i - \bar{Y})^2}}$ is $\frac{unexplained \ variance}{Sum \ of \ all \ variance}$

so

$1- \frac{unexplained \ variance}{Sum \ of \ all \ variance} = \frac{Sum \ of \ all \ variance - unexplained \ variance}{Sum \ of \ all \ variance} = \frac{explained \ variance}{Sum \ of \ all \ variance} $

Does this help?

  • Hi, could you please explain what you mean by the 'model can't explain'? – python_learner Feb 08 '17 at 19:56
  • So the residuals are the errors left after fitting, for each $y_i$. If the residual for $y_0$ is 0, then the model has perfectly predicted $y_0$ (in the observed data) and thus can completely explain/predict $y_0$. If the residual is not 0, then there is some variance in $y_0$ that the model is not explaining. Therefore, $\sum{(y_i - \hat{y}_i})$ is the sum of the predicted values, minus the actual values, so the sum of what is not explained by the model – Conrad De Peuter Feb 08 '17 at 20:05
  • In your definition you have $\frac{1}{n}$ as a normalizing term for your variance. When considering proportions of variance you should either include that in both the numerator and denominator or neither, otherwise one may be orders of magnitude larger than the other. Try considering your definition of total variance without the normalizing term. – Conrad De Peuter Feb 09 '17 at 20:13
4

We have $TSS = \sum_i (Y_i - \bar{Y})^2,\ RSS = \sum_i(Y_i - \hat{Y}_i)^2,\ ESS = \sum_i(\hat{Y}_i - \bar{Y})^2$

$TSS$ - total variance, $RSS$ - residual variance, $ESS$ - regression variance

From ANOVA identity we know that

$$TSS = RSS + ESS$$

So we have $R^2 = 1 - \frac{RSS}{TSS} = \frac{ESS}{TSS}$. From last equation you can clearly see that $R^2$ states how much "variance" is explained by the regression

Łukasz Grad
  • 2,118
  • 1
  • 7
  • 10