1

Adjusted R2 is said to be more unbiased than ordinary R2 as it takes the number of explanatory variables into account.

Can adjusted R2 be used in a model with only an intercept and one independent variable?

Following this, let's say we want to compare two nested linear models, one with 1 independent variable and the other adding an additional variable. I gather that adjusted R2 should be calculated for the bigger model, but should this be compared with the smaller model's ordinary or adjusted R2?

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
user31527
  • 55
  • 5
  • 1
    $R^2_{adj.}$ is an unbiased estimator of the population $R^2$ under the null hypothesis that all the variables have zero slopes in population, which is not a very interesting case. $R^2_{adj.}$ is not necessarily unbiased in other, more interesting cases, and whether one should prefer $R^2_{adj.}$ to $R^2$ or not depends on the true slopes. – Richard Hardy May 14 '19 at 10:13

2 Answers2

2

Some additional notes to what has been said so far.

Note that $R^{2}$ can not decrease if one adds new variable but only increase. So even if you would add random variables $R^{2}$ can become quite high. See the following example from the R code:

set.seed(10) # make the example reproducible
n <- 100 # sample size
k <- 20 # number of predictors
df <- data.frame(y= rnorm(n), matrix(rnorm(n*(k)), ncol= k)) #  generate some *random* data
summary(lm(y ~ ., data= df)) # fit a regression model

# results
# Multiple R-squared:  0.2358
# Adjusted R-squared:  0.0423

$R^{2}$ is 0.2358% which is way too high if we keep in mind that we used only random variables. On the other hand, the $R^{2}_{adj}$ is 0.0423 which is much closer to what we would expect should happen if we use random variables.

This is great but if you use $R^{2}_{adj}$ for a few variables, keep in mind that $R^{2}_{adj}$ can have negative values. See here:

radj <- rep(NA, ncol(df) - 1) # vector for results
for(i in 2:ncol(df)){ # determine radj for every x
radj[i-1] <- summary(lm(y ~ df[ , i], data=df))$adj.r.squared
}

sum(radj < 0) # number of negative radj
# 11

In this example 11 of 20 predictors have a negative $R^{2}_{adj}$. I agree with the suggestion of @kjetil b halvorsen (+1). I just want to point out this property of $R^{2}_{adj}$ which you might encounter since you want to use $R^{2}_{adj}$ for a few variables and because a negative value might be confusing at first.

0

With only one (or only a few) predictor variables, the adjusted R-squared can be used, but it will not be very different from the unadjusted R-square, so it doesn't really matter. The adjustment was invented as a solution to problems caused by variable selection, so if you are not doing variable selection it isn't necessary.

But, if you are using R-square to compare models, you should use the same version in all cases. So maybe just stay with adjusted R-square.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467