While using the drop1 command in R for model building, it is said the variable with the lowest AIC value must be dropped. What could be the reason for the same? I know AIC talks about information loss and a lower AIC value is better, but dropping a variable with low AIC seems counter intuitive. Can someone please explain the reason for doing so?
Asked
Active
Viewed 7,307 times
9
-
2Check http://stats.stackexchange.com/questions/20836/algorithms-for-automatic-model-selection/20856#20856 – Tim May 22 '16 at 10:55
1 Answers
9
The given AIC from drop1
relates to the whole model - not to a variable, so the output tells you which variable to remove in order to yield the model with the lowest AIC. For example, with the built-in dataset swiss
lm1 <- lm(Fertility ~ ., data = swiss)
drop1(lm1, test = "F") # So called 'type II' anova
Single term deletions
Model:
Fertility ~ Agriculture + Examination + Education + Catholic +
Infant.Mortality
Df Sum of Sq RSS AIC F value Pr(>F)
<none> 2105.0 190.69
Agriculture 1 307.72 2412.8 195.10 5.9934 0.018727 *
Examination 1 53.03 2158.1 189.86 1.0328 0.315462
Education 1 1162.56 3267.6 209.36 22.6432 2.431e-05 ***
Catholic 1 447.71 2552.8 197.75 8.7200 0.005190 **
Infant.Mortality 1 408.75 2513.8 197.03 7.9612 0.007336 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here, the removal of Examination
will yield the model with the lowest AIC
On a related note, while it may be better to use AIC than p values, it is considered bad practice to use any automatic model selection algorithms: Algorithms for automatic model selection

Robert Long
- 53,316
- 10
- 84
- 148
-
1In fact if you're looking at a single variable at a time, using AIC corresponds to setting a p-value cut-off of 15.7% – Glen_b May 22 '16 at 14:49
-
@Glen_b Interesting ! I assume that's an asymptotic result based on the chi-square distribution tail ? Then using p values and AIC are equally as bad, for large samples ! ? – Robert Long May 22 '16 at 15:02
-
1Yes, an asymptotic chi-square result (if you use R, it's `pchisq(2,1,lower.tail=FALSE)`); it will correspond to a two-tailed z-test p-value (`pnorm(sqrt(2),lower.tail=FALSE)*2`), and so unless the d.f. are fairly small it will also closely approximate a t-test or F-test p-value cut off (above 40 d.f. it's 16% to the nearest whole percent for any d.f.) – Glen_b May 22 '16 at 22:53