3

If you have several linear models, say model1, model2 and model3, how would you cross-validate it to pick the best model?

(In R)

I'm wondering this because my AIC and BIC for each model are not helping me determine a good model. Here are the results:

Model - Size (including response) - Mallows Cp - AIC - BIC 
Intercept only - 1 - 2860.15 - 2101.61 - 2205.77
1 - 5 - 245.51 - 1482.14 - 1502.97 
2 - 6 - 231.10 - 1472.88 - 1497.87  
3 - 7 - 179.76 - 1436.29 - 1465.45  
4 - 8 - 161.05 - 1422.43 - 1455.75  
5 - 9 - 85.77 - 1360.06 - 1397.55  
6 - 10 - 79.67 - 1354.79 - 1396.44 
7 - 17 - 27.00 - 1304.23 - 1375.04
All Variables - 25 - 37.92 - 1314.94 - 1419.07

Note - assume the models are nested.

Stat
  • 7,078
  • 1
  • 24
  • 49
Dino Abraham
  • 439
  • 5
  • 11
  • why isn't you're AIC helping? AIC and BIC appear even to coincide here. – charles Mar 10 '14 at 00:34
  • The lowest AIC/BIC is obtained on the 17 variable model - so this is the best? Shouldn't I do some calculations to check if the 'additional AIC or BIC' is worth an extra variable? Also - would you use LASSO to get several models, subsets regression, or neither? I used Subsets to create 6 models but now I'm wondering if its better to use LASSO? The question also is - is this method even correct for justifying which model is the best 'compromise'? – Dino Abraham Mar 10 '14 at 02:09

1 Answers1

4

The difference in AIC between two specified models is the estimated expected difference in Kullback–Leibler divergence from each to the true model, & therefore a useful model-selection criterion. The difference in AIC between the model with the lowest AIC out of several & the one with the second-lowest is something else, & therefore has to be taken with a pinch of salt as a model-selection criterion. If 'several' becomes 'many' it's not much use at all. As @gung says here, "if you run a study several times and fit the same model, the AIC will bounce around just like everything else".

What you need to cross-validate is the fit of the final model, repeating the entire model selection procedure within each cross-validation fold. You mention in a comment that you used "Subsets" to create six models, which suggests some selection going on there. Even if you use a measure of fit calibrated by cross-validation instead of AIC to pick the best model, you still need to cross-validate that procedure (see @Dikran's answer here, or @Bogdanovist's here).

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
  • Interesting. If I use a cross-validated LASSO function, then change the 'fraction' so it produces several models, then select the best one on AIC/BIC change then is it okay as the models have been cross-validated? This is what is confusing me quite a bit... – Dino Abraham Mar 10 '14 at 18:53
  • I'm not sure how to say what I've already said differently. See @cbeleites' answer [here](http://stats.stackexchange.com/questions/65128/nested-cross-validation-for-model-selection). Basing model selection on cross-validation is itself a selection procedure & can be cross-validated - the necessity for cross-validation increases with the number of possible models you're selecting from. – Scortchi - Reinstate Monica Mar 10 '14 at 20:22
  • Is Kullback–Leibler divergence = exp((AIC min - AIC i)/2)? I found this on wiki but it doesn't say what 'number range' we look for in a model to select it. I have created 10 cross validated models using LASSO just for different 'fractions'. The link you provided explains I need to cross validate it again but I'm unsure how to implement this in R (assume all models are in linear model form). – Dino Abraham Mar 10 '14 at 22:45
  • For Kullback–Leibler divergence I need a distrubiton per linear model right? How do i get this. I found ~AIC = exp((AIC min - AIC i)/2) on wiki, to see if a model has more info then the minimum AIC, but it doesn't say what 'number range' we look for in a model to select it. I have created 10 cross validated models using LASSO just for different 'fractions'. The link you provided explains I need to cross validate it again but I'm unsure how to implement this in R (assume all models are in linear model form). Infact reading the link again it seems I cannot pick a model? – Dino Abraham Mar 10 '14 at 22:54
  • Ignore the comment above about number range. It says if the exp(AICmin-AICi/2) > 10 we reject model i. In this case I reject all my models except the OLS. However, this is bad, as I know several terms are collinear. Something here is not making sense :( – Dino Abraham Mar 10 '14 at 23:41