Adjusted R2 for model with only one independent variable?

Question

Adjusted R2 is said to be more unbiased than ordinary R2 as it takes the number of explanatory variables into account.

Can adjusted R2 be used in a model with only an intercept and one independent variable?

Following this, let's say we want to compare two nested linear models, one with 1 independent variable and the other adding an additional variable. I gather that adjusted R2 should be calculated for the bigger model, but should this be compared with the smaller model's ordinary or adjusted R2?

$R^2_{adj.}$ is an unbiased estimator of the population $R^2$ under the null hypothesis that all the variables have zero slopes in population, which is not a very interesting case. $R^2_{adj.}$ is not necessarily unbiased in other, more interesting cases, and whether one should prefer $R^2_{adj.}$ to $R^2$ or not depends on the true slopes. — Richard Hardy, May 14 '19 at 10:13

score 2 · Answer 1 · answered May 14 '19 at 10:13

Some additional notes to what has been said so far.

Note that $R^{2}$ can not decrease if one adds new variable but only increase. So even if you would add random variables $R^{2}$ can become quite high. See the following example from the R code:

set.seed(10) # make the example reproducible
n <- 100 # sample size
k <- 20 # number of predictors
df <- data.frame(y= rnorm(n), matrix(rnorm(n*(k)), ncol= k)) #  generate some *random* data
summary(lm(y ~ ., data= df)) # fit a regression model

# results
# Multiple R-squared:  0.2358
# Adjusted R-squared:  0.0423

$R^{2}$ is 0.2358% which is way too high if we keep in mind that we used only random variables. On the other hand, the $R^{2}_{adj}$ is 0.0423 which is much closer to what we would expect should happen if we use random variables.

This is great but if you use $R^{2}_{adj}$ for a few variables, keep in mind that $R^{2}_{adj}$ can have negative values. See here:

radj <- rep(NA, ncol(df) - 1) # vector for results
for(i in 2:ncol(df)){ # determine radj for every x
radj[i-1] <- summary(lm(y ~ df[ , i], data=df))$adj.r.squared
}

sum(radj < 0) # number of negative radj
# 11

In this example 11 of 20 predictors have a negative $R^{2}_{adj}$. I agree with the suggestion of @kjetil b halvorsen (+1). I just want to point out this property of $R^{2}_{adj}$ which you might encounter since you want to use $R^{2}_{adj}$ for a few variables and because a negative value might be confusing at first.

score 0 · Accepted Answer · answered May 14 '19 at 09:20

With only one (or only a few) predictor variables, the adjusted R-squared can be used, but it will not be very different from the unadjusted R-square, so it doesn't really matter. The adjustment was invented as a solution to problems caused by variable selection, so if you are not doing variable selection it isn't necessary.

But, if you are using R-square to compare models, you should use the same version in all cases. So maybe just stay with adjusted R-square.

Adjusted R2 for model with only one independent variable?

2 Answers2

Linked