Why to report R squared?

Question

If adjusted R squared is superior to R squared, then why do statistical software continue to report the latter? Is there any kind of situation when a researcher may prefer to use R squared instead of adjusted R squared?

What kind of regression are you dealing with? If I am not mistaken, for linear regression, there is no difference between the R-squared and the adjusted R-squared. So in this case it is very appropriate to use the plain R-squared value. — alesc, Apr 22 '15 at 07:38
A linear one. But statistical packages provides both measures. That's why I wonder why. — Mike Senin, Apr 22 '15 at 07:42
Well, according to [Wiki](http://en.wikipedia.org/wiki/Coefficient_of_determination#Adjusted_R2), the equation is a bit different even for linear regression (`p=1`). But the whole point of adjusted R-squared is "_The use of an adjusted R2 is an attempt to take account of the phenomenon of the R2 automatically and spuriously increasing when extra explanatory variables are added to the model._". Linear regression doesn't have any additional explanatory variable, because it is the most primitive type of regression. — alesc, Apr 22 '15 at 07:51
@alesc, I know that. What I don't know is why to report both values. — Mike Senin, Apr 22 '15 at 07:52
What are you trying to prove with your R-squared value? Do you compare different regression models? If you compare linear and non-linear regression models, then it would make sense to use adjusted R-squared, otherwise the plain R-square will be sufficient. But then again, you can also use the adjusted R-squared even for linear regression :) I personally would not report both values. So choose one metric and report only that value (either R-square or adjusted R-square). — alesc, Apr 22 '15 at 07:56
I try to prove nothing. Looks like you misread my post. "... statistical software continue to report ", not me. — Mike Senin, Apr 22 '15 at 08:00
The following post may give a good answer to this question when to use adjusted or unadjusted: http://stats.stackexchange.com/questions/29185/adjusted-r2-versus-r2-in-multiple-regression — Ruthger Righart, Apr 22 '15 at 08:02
My best guess would be for legacy reasons. A lot of research uses the plain R-square metric and if you want to make an upgrade or repeat the experiment, you should use the same metrics as the original research (so that you can compare the results). — alesc, Apr 22 '15 at 08:03
@alesc, do you mean *simple* linear regression when you write linear regression? — Christoph Hanck, Apr 22 '15 at 09:52
I have had many clients who preferred $R^2$ because they wanted the results to look as good as possible and this is always larger than the adjusted $R^2$ :-). I rarely use $R^2$, for [reasons explained elsewhere](http://stats.stackexchange.com/questions/13314/is-r2-useful-or-dangerous/13317?s=1|0.0000#13317). — whuber, Apr 29 '15 at 15:39

score 6 · Answer 1 · answered Apr 22 '15 at 09:58

6

Under conditions for instance explained here, $R^2$ measures the proportion of the variance in the dependent variable explained by the regression, which is a natural measure. Adjusted $R^2$ does not have this interpretation, as it modifies the $R^2$ value.

So while adjusted $R^2$ has the indisputable advantage of not increasing automatically when the number of regressors goes up, you pay a price in terms of how you can interpret the measure.

Note I am not advocating the use of one or the other, just giving a possible reason for why people still use the standard $R^2$.

answered Apr 22 '15 at 09:58

Christoph Hanck

25,948
3
57
106

1

Quick question: is it perhaps true that $R^2_{adj.}$ is a consistent estimator of the population $R^2$ under some conditions, e.g. a well-specified model? Then it would make sense to report $R^2_{adj.}$ in place of $R^2$. – Richard Hardy Apr 22 '15 at 11:56
3

Yes, but as we can write $R_{adj.}^2=1-\frac{n-1}{n-K}+\frac{n-1}{n-K}R^2$ and, obviously, $\frac{n-1}{n-K}\to1$ (at least when, as is mostly assumed, $K$ remains fixed as $n\to\infty$), we have that $R_{adj.}^2-R^2=o_p(1)$, so that does not seem to be a reason to prefer one over the other. – Christoph Hanck Apr 22 '15 at 12:02
$K$ is of course the number of regressors – Christoph Hanck Apr 22 '15 at 12:03
So perhaps then $R^2_{adj.}$ is unbiased while $R^2$ is biased? I think this could be rather relevant given that $R^2_{adj.}$ is consistent. – Richard Hardy Apr 22 '15 at 12:17
1

Well...do we define population $R^2$ as $1-\sigma^2/Var(y)$? If so, writing $R^2_{adj.}=1-\frac{s^2}{\sum_i(y-\bar{y})^2/(n-1)}$ ($s^2$ the d.f.-adjusted variance estimate dividing by $n-K$) shows that both the estimator of the error variance in the numerator and that of the variance of $y$ in the denominator are unbiased for the respective population parameters, $E(s^2)=\sigma^2$ and $E[\sum_i(y-\bar{y})^2/(n-1)]=Var(y)$. But that does not make the ratio an unbiased estimator of the ratios of the parameters, as the expectation operator does not pass through nonlinear functions in general. – Christoph Hanck Apr 22 '15 at 12:40
1

Thanks. Perhaps I should have posted my comments as a separate question, then I could have upvoted your answers. Since I suspected similar things have been asked, I just hoped for a short confirmation/disconfirmation, comment style. You were more explicit than that, I appreciate it! – Richard Hardy Apr 22 '15 at 13:16

score 1 · Answer 2 · answered Apr 29 '15 at 05:47

Adjusted R-squared is useful for comparing different regression models. This task cannot be accomplished by R-squared which, as Others have already said, has another informative goal, that is expressing the proportion of variance of the dependent variable that is explained by the regression model under investigation.

Why to report R squared?

2 Answers2

Linked

Related