12

I am curious about the statement made at the bottom of the first page in this text regarding the $R^2_\mathrm{adjusted}$ adjustment

$$R^2_\mathrm{adjusted} =1-(1-R^2)\left({\frac{n-1}{n-m-1}}\right).$$

The text states:

The logic of the adjustment is the following: in ordinary multiple regression, a random predictor explains on average a proportion $1/(n – 1)$ of the response’s variation, so that $m$ random predictors explain together, on average, $m/(n – 1)$ of the response’s variation; in other words, the expected value of $R^2$ is $\mathbb{E}(R^2) = m/(n – 1)$. Applying the [$R^2_\mathrm{adjusted}$] formula to that value, where all predictors are random, gives $R^2_\mathrm{adjusted} = 0$."

This seems to be a very simple and interpretable motivation for $R^2_\mathrm{adjusted}$. However, I have not been able to work out that $\mathbb{E}(R^2)=1/(n – 1)$ for single random (i.e. uncorrelated) predictor.

Could someone point me in the right direction here?

amoeba
  • 93,463
  • 28
  • 275
  • 317
gregory_britten
  • 1,253
  • 9
  • 15

1 Answers1

10

This is accurate mathematical statistics. See this post for the derivation of the distribution of $R^2$ under the hypothesis that all regressors (bar the constant term) are uncorrelated with the dependent variable ("random predictors").

This distribution is a Beta, with $m$ being the number of predictors without counting the constant term, and $n$ the sample size,

$$R^2 \sim Beta\left (\frac {m}{2}, \frac {n-m-1}{2}\right)$$

and so

$$E(R^2) = \frac {m/2}{(m/2)+[(n-m-1)/2]} = \frac{m}{n-1}$$

This appears to be a clever way to "justify" the logic behind the adjusted $R^2$: if indeed all regressors are uncorrelated, then the adjusted $R^2$ is "on average" zero.

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241