3

Say that I am performing a multiple linear regression with 3 variables. If I want to say that two of these variables account for some percentage of the observed variance in the third variable, should I use my $R ^2$ value, or adjusted $R^2$ value?

I understand that the adjusted R squared value accounts for the fact that I have have more predictors (as compared to a regression of only two variables), but I'm wondering how that translates to my interpretation of the variance in these variables.

mkt
  • 11,770
  • 9
  • 51
  • 125
  • Are you asking about this in the context of assessing multicollinearity? Eg, working up to computing the VIF? – gung - Reinstate Monica May 24 '18 at 12:55
  • Possible duplicate of [How to split r-squared between predictor variables in multiple regression?](https://stats.stackexchange.com/questions/60872/how-to-split-r-squared-between-predictor-variables-in-multiple-regression) – Michael R. Chernick May 29 '18 at 04:52
  • @MichaelChernick it is unclear to me whether the OP want to split $R^2$ or whether it is about the choice between adjusted and unadjusted. Perhaps Matthew can edit to clarify? – mdewey May 29 '18 at 08:45

2 Answers2

1

If you want to describe how much of the total variance in $X_1$ is explained by $X_2$ and $X_3$ using a linear model, then use $R^2$ which by definition gives just this number.

Save the adjusted $R^2$ for when you want to assess if it is worthwhile to include yet another variable, say $X_4$, in an attempt to model $X_1$ more closely, since (regular) $R^2$ will always increase when adding more variables.

You might want to read the wiki-page on the subject, which includes a note on the use of adjusted $R^2$.

mkt
  • 11,770
  • 9
  • 51
  • 125
  • 1
    My reading of "If I want to say that two of these variables account for some percentage of the observed variance in the third variable" is that the OP is asking about R2 for, say, X1 as a function of X2 & X3, which is what's discussed in this answer. AFAICT, this answer is on point. – gung - Reinstate Monica May 24 '18 at 12:54
  • This answer seems to be correct given the understanding of the question. However, I think it might lead to dangerous misunderstandings. This is why I provided an alternative answer. – Julian Karch Apr 06 '21 at 16:56
0

The question is not precise enough to provide a clear answer. The crucial part missing whether you want to know the amount of variance of $X_1$ that $X_2$, and $X_3$, or more generally your predictors, explain in the sample or in the population. It seems like the existing answer understood your question to be in the sample for which it provides the correct answer: use $R^2$.

However, when applying regression models we are typically not interested in making statements about the sample but would rather want to generalized those to the population. If you are interested in the amount of variance explained in the population, adjusted $R^2$ is almost always a better estimator than normal $R^2$.

Which estimator is best and under which conditional actually is a rather complicated matter. A paper length discussion of this (by me) can be found here: https://online.ucpress.edu/collabra/article/6/1/45/114458/Improving-on-Adjusted-R-Squared. Another user provided a nice summary of the main results here: https://stats.stackexchange.com/a/451772/30495.

Julian Karch
  • 1,433
  • 1
  • 13
  • 26