16

I was reading about regression metrics in the python scikit-learn manual and even though each one of them has its own formula, I cannot tell intuitively what is the difference between $R^2$ and variance score and therefore when to use one or another to evaluate my models.

Ferdi
  • 4,882
  • 7
  • 42
  • 62
hipoglucido
  • 286
  • 1
  • 2
  • 11

2 Answers2

8
  1. $R^2 = 1- \frac{SSE}{TSS}$

  2. $\text{explained variance score} = 1 - \mathrm{Var}[\hat{y} - y]\, /\, \mathrm{Var}[y]$, where the $\mathrm{Var}$ is biased variance, i.e. $\mathrm{Var}[\hat{y} - y] = \frac{1}{n}\sum(error - mean(error))^2$. Compared with $R^2$, the only difference is from the mean(error). if mean(error)=0, then $R^2$ = explained variance score

  3. Also note that in adjusted-$R^2$, unbiased variance estimation is used.

Dean
  • 112
  • 1
  • 2
    sklearn doesn't have adjusted-R2 does it? – Hack-R Jun 08 '17 at 15:05
  • @Hack-R actually [it have](https://scikit-learn.org/stable/modules/model_evaluation.html#r2-score-the-coefficient-of-determination) – mMontu Dec 24 '18 at 17:33
  • @mMontu That is R2. – vasili111 Feb 25 '21 at 17:26
  • @Hack-R For adjusted R2 see this: https://stackoverflow.com/questions/51023806/how-to-get-adjusted-r-square-for-linear-regression and https://stackoverflow.com/questions/49381661/how-do-i-calculate-the-adjusted-r-squared-score-using-scikit-learn/49381947 – vasili111 Feb 25 '21 at 17:27
1

Dean's answer is right.

Only I think there is a minor typo here: $Var[\hat{y}-y]=sum(error^2-mean(error))/n$.

I guess it should be $Var[\hat{y}-y]=sum(error-mean(error))^2/n$.

My reference is the source code of sklearn here:https://github.com/scikit-learn/scikit-learn/blob/bf24c7e3d/sklearn/metrics/_regression.py#L396

Siong Thye Goh
  • 6,431
  • 3
  • 17
  • 28
Seraph
  • 11
  • 1