1

I am trying to understand how AIC works. I am using data from this tutorial: https://www.jaredknowles.com/journal/2013/11/25/getting-started-with-mixed-effect-models-in-r

library(lme4) # load library
library(arm) # convenience functions for regression in R
lmm.data <- read.table("http://bayes.acs.unt.edu:8083/BayesContent/class/Jon/R_SC/Module9/lmm.data.txt",
                       header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)

When fitting the first model we get this AIC:

MLexamp <- glm(extro ~ open + agree + social, data=lmm.data)
AIC(MLexamp)
[1] 8774.291

From my understanding, if I add a variable which is totally uncorrelated to the model, the AIC should compensate the overfitting to stay the same in mean, but it appears to correct more than that:

res.rand <- replicate(1000, {
  lmm.data$rand.cont <- rnorm(nrow(lmm.data))
  list(aic = AIC(glm(extro ~ open + agree + social + rand.cont, data=lmm.data )),
       adj.r2 = summary(lm(extro ~ open + agree + social + rand.cont, 
                           data=lmm.data))$adj.r.squared)
}, simplify=F)

mean(sapply(res.rand, "[[", "aic"))
[1] 8775.331

sd(sapply(res.rand, "[[", "aic"))
[1] 1.267697

The AIC is in mean 1 point higher than in the first model.

If I estimate adjusted R squared from lm, adding the uncorrelated variable has approximately no effect in mean:

OLSexamp <- lm(extro ~ open + agree + social, data = lmm.data)
summary(OLSexamp)$adj.r.squared
[1] -0.001984873

mean(sapply(res.rand, "[[", "adj.r2"))
[1] -0.00202206

sd(sapply(res.rand, "[[", "adj.r2"))
[1] 0.001057333

Can you figure out what I missed?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Jean Paul
  • 172
  • 7

1 Answers1

2

"From my understanding, if I add a variable which is totally uncorrelated to the model, the AIC should compensate the overfitting to stay the same in mean, but it appears to correct more than that:"

This is incorrect in multiple ways. AIC does not 'compensate' anything.

As you add more parameters to a model, its $R^2$ will increase. At some point, though, you will start to overfit your data, meaning the continued increase in $R^2$ is misleading (predictive performance on a holdout dataset gets worse beyond this point, even though the $R^2$ continues to increase). AIC is a tool that can be used to diagnose this misleading increase in model complexity. It is a number that quantifies relative model fit, and it works in part by penalising models for being more complex i.e. having more parameters. As long as an additional parameter adds some degree of explanatory/predictive power to the model, the AIC will improve (decrease) in spite of the parameter penalty.

And if you add an uncorrelated variable as a predictor, you are making the model more complex while adding essentially zero explanatory/predictive power to it. So the AIC gets worse (increases). Which is exactly what it is designed to do.

mkt
  • 11,770
  • 9
  • 51
  • 125
  • I meant it compensates overfitting in the sense of the formula $\mathit{AIC} = 2k - 2\ln(L)$: if one adds an uncorrelated random variable, log-likelihood will increase due to overfitting but not AIC thanks to the 2k penalty. In fact, adjusted $R^2$ does not increase after adding uncorrelated variables because it also corrects for overfitting. From my understanding, the difference between adjusted $R^2$ and AIC is that adjusted $R^2$ only represents the true information that is captured by the model, whereas AIC also penalizes for false information. – Jean Paul Sep 26 '19 at 14:27
  • @JeanPaul "*log-likelihood will increase due to overfitting but not AIC thanks to the 2k penalty*". Yes, exactly. Your error is in the earlier quotation "*AIC... [will] stay the same in mean, but it appears to correct more than that*". It will not stay the same - it will get **worse**, and that is by design. If it stayed the same when you added an uncorrelated variable, there would be no use in AIC at all. – mkt Sep 26 '19 at 14:59
  • @JeanPaul You are correct that adjusted $R^2$ also penalises models for complexity. However, this penalty is not very large and the claim that "*...adjusted $R^2$ only represents the true information that is captured by the model*" is not accurate. See [this comment](https://stats.stackexchange.com/questions/13314/is-r2-useful-or-dangerous#comment23324_13317) and the associated answer for more information. – mkt Sep 26 '19 at 15:05
  • One limitation that I find for AIC compared to adjusted $R^2$ is that AIC is dependent on sample size in the sense that adding a variable with a very noisy but existent signal will not add useful information if the sample size is too small so the AIC will increase. But with a large enough sample size, the same variable will provide some useful information so the AIC will decrease. One does not have this issue for adjusted $R^2$: even with small sample sizes, it will be positive in mean and it will stay the same in mean when the sample size increases. – Jean Paul Sep 26 '19 at 21:55
  • @JeanPaul If you wish to ask a question about comparing AIC and adjusted $R^2$, you are welcome to do so. I believe my present answer has addressed the question you posed. – mkt Sep 27 '19 at 04:54
  • My primary goal was not to compare AIC and adjusted $R^2$, I just used $R^2$ to better discern the limitation of AIC. I think the limit of AIC is the failure to be generalized outside of the current sample, so it cannot say in absolute if a model is better than another. It is nevertheless an use that I saw in the literature. – Jean Paul Sep 27 '19 at 08:49
  • @JeanPaul I don't understand your new claim about AIC, but it seems sufficiently different that it may be worth asking a new question about that as well. – mkt Sep 27 '19 at 09:04
  • Here is my new question: https://stats.stackexchange.com/questions/428974/can-results-for-model-selection-with-aic-be-interpretable-at-the-population-leve – Jean Paul Sep 27 '19 at 11:15