-1

In linear regression, I seemed to see (but I forgot where) that if the square sum of residuals divided by square sum of total variation equals the square of some correlation between some random variables, and hence it explains why the ratio is called "R Square".

So I was wondering what the correlation is between, and why the ratio is the square of the correlation?

What assumptions on the linear regression model are needed for the explanation of "R Square" in terms of correlation?

Thanks!

Tim
  • 1
  • 29
  • 102
  • 189
  • 3
    This is backwards, as the measure concerned would be 1 whenever the residual and total variation were identical, which would correspond to a useless regression. But yes, there is a good reason why r square is so named and this is covered in many places. See e.g. http://stats.stackexchange.com/questions/65779/why-is-coefficient-of-determination-used-to-assess-fit-of-a-least-squares-line or – Nick Cox Jul 28 '13 at 23:23
  • 2
    In multiple regression it would be the square of the correlation between fitted and observed. – Glen_b Jul 28 '13 at 23:33
  • @Glen_b: Thanks! What assumptions on the linear regression model are needed for the explanation of "R Square" in terms of correlation? – Tim Jul 29 '13 at 10:26
  • I believe it's an *algebraic* identity. – Glen_b Jul 29 '13 at 22:18

2 Answers2

2

$R^2 = 1 - (\sigma^2 / s^2)$

where $\sigma^2$ is the variance of the residuals and $s^2$ is the variance of the input data.

Randel
  • 6,199
  • 4
  • 39
  • 65
rcorty
  • 438
  • 4
  • 13
  • Alternatively, it can be $R^2 = 1 - \frac{SSres}{SStot}$. – Waldir Leoncio Jul 29 '13 at 14:49
  • Actually, $1 - \frac{SSres}{SStot}$ (or, alternatively and perhaps more intuitively $\frac{SSreg}{SStot}$) will give you the multiple $R^2$, whereas $1 - \frac{\sigma^2}{s^2}$ will yield the adjusted $R^2$. – Waldir Leoncio Jul 29 '13 at 15:01
2

R square is the square of the correlation coefficient between actual response values and predicted values of the model. If actual and predicted values are quite close, this means the model is quite good, and in such a case (assuming linear relation between variables), R square will also be high.

It also expresses the ratio of explained and total variation in the model.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Outlier
  • 161
  • 4