7

Consider a simple regression model $$ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i. $$ and suppose it is the correct model for the data. As far as I know, $R^2_{adjusted}$ is an unbiased estimator of the population $R^2$ under the null hypothesis that $\beta_1=0$. I believe it is consistent, too.

Questions:

  1. Is $R^2_{adjusted}$ an unbiased estimator of the population $R^2$ under the alternative that $\beta_1\neq 0$?
  2. Is is consistent?
Richard Hardy
  • 54,375
  • 10
  • 95
  • 219

1 Answers1

5

As a preliminary result, $R^2_{adjusted}$ is indeed unbiased under the null, at least under error normality.

From this question we have that $$ R^2\sim Beta(1/2,(n-2)/2) $$ under the null in the present setting of a simple regression ($k=2$). Hence, its mean is $$ E(R^2)=\frac{1}{n-1} $$ so that, from $$ R^2_{adjusted}=1-(1-R^2)\frac{n-1}{n-2}, $$ we find $$ E(R^2_{adjusted})=1-(1-E(R^2))\frac{n-1}{n-2}=0 $$ In fact, this result does not hinge on the simple regression case, as $R^2\sim Beta((k-1)/2,(n-k)/2)$ in general, so that $E(R^2)=(k-1)/(n-1)$ and $$ E\left(1-(1-R^2)\frac{n-1}{n-k}\right)=0. $$

As to consistency, it is given for any vector $\beta$: write $$ R^2=1-\frac{\hat{u}'\hat{u}/n}{\tilde{y}'\tilde{y}/n} $$ with $\tilde{y}$ denoting demeaned $y$s, standard laws of large numbers give us that sample variances consistently estimate population variances, $\hat{u}'\hat{u}/n\to_p\sigma^2_u$ and $\tilde{y}'\tilde{y}/n\to_p\sigma^2_y$.

Hence, by Slutzky's theorem, $$ R^2\to_p1-\frac{\sigma^2_u}{\sigma^2_y}, $$ i.e., (at least what I consider) the population $R^2$. Since $R^2_{adjusted}-R^2=o_p(1)$, the same holds true for $R^2_{adjusted}$.

As for the mean of $R^2_{adjusted}$ under the alternative, this thread appears helpful. It establishes a noncentral beta distribution for $R^2$ under the alternative. I have not been able to use results like these to say something precise about $E(R^2)$.

In any case, this little simulation suggests that the answer is no:

reps <- 10000
adj.R2 <- rep(NA,reps)
beta <- 1
n <- 10

V.u <- 2
V.x <- 3
for (i in 1:reps){
  u <- rnorm(n, sd=sqrt(V.u))
  x <- rnorm(n, sd=sqrt(V.x))
  y <- beta*x + u
  adj.R2[i] <- summary(lm(y~x))$adj.r.squared
}

Result:

> mean(adj.R2)
[1] 0.5444916

> (pop.R2 <- 1-V.u/(beta^2*V.x+V.u))
[1] 0.6
Christoph Hanck
  • 25,948
  • 3
  • 57
  • 106
  • This http://functions.wolfram.com/webMathematica/FunctionEvaluation.jsp?name=Hypergeometric2F2 could maybe be used for computing the exact expectation. – Christoph Hanck Apr 27 '18 at 08:52