9

While using the drop1 command in R for model building, it is said the variable with the lowest AIC value must be dropped. What could be the reason for the same? I know AIC talks about information loss and a lower AIC value is better, but dropping a variable with low AIC seems counter intuitive. Can someone please explain the reason for doing so?

Jash Shah
  • 213
  • 2
  • 6
  • 2
    Check http://stats.stackexchange.com/questions/20836/algorithms-for-automatic-model-selection/20856#20856 – Tim May 22 '16 at 10:55

1 Answers1

9

The given AIC from drop1 relates to the whole model - not to a variable, so the output tells you which variable to remove in order to yield the model with the lowest AIC. For example, with the built-in dataset swiss

lm1 <- lm(Fertility ~ ., data = swiss)
drop1(lm1, test = "F")  # So called 'type II' anova

Single term deletions

Model:
Fertility ~ Agriculture + Examination + Education + Catholic + 
    Infant.Mortality
                 Df Sum of Sq    RSS    AIC F value    Pr(>F)    
<none>                        2105.0 190.69                      
Agriculture       1    307.72 2412.8 195.10  5.9934  0.018727 *  
Examination       1     53.03 2158.1 189.86  1.0328  0.315462    
Education         1   1162.56 3267.6 209.36 22.6432 2.431e-05 ***
Catholic          1    447.71 2552.8 197.75  8.7200  0.005190 ** 
Infant.Mortality  1    408.75 2513.8 197.03  7.9612  0.007336 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here, the removal of Examination will yield the model with the lowest AIC

On a related note, while it may be better to use AIC than p values, it is considered bad practice to use any automatic model selection algorithms: Algorithms for automatic model selection

Robert Long
  • 53,316
  • 10
  • 84
  • 148
  • 1
    In fact if you're looking at a single variable at a time, using AIC corresponds to setting a p-value cut-off of 15.7% – Glen_b May 22 '16 at 14:49
  • @Glen_b Interesting ! I assume that's an asymptotic result based on the chi-square distribution tail ? Then using p values and AIC are equally as bad, for large samples ! ? – Robert Long May 22 '16 at 15:02
  • 1
    Yes, an asymptotic chi-square result (if you use R, it's `pchisq(2,1,lower.tail=FALSE)`); it will correspond to a two-tailed z-test p-value (`pnorm(sqrt(2),lower.tail=FALSE)*2`), and so unless the d.f. are fairly small it will also closely approximate a t-test or F-test p-value cut off (above 40 d.f. it's 16% to the nearest whole percent for any d.f.) – Glen_b May 22 '16 at 22:53