11

Chris Chatfield, whose many quality books and papers I enjoyed reading, in (1) gives the following advice:

For example, the choice between ARIMA time-series models with low and approximately equal values of the AIC should probably be made, not on which happens to give the minimum AIC, but on which gives the best forecasts of the most recent year's data.

What is the rationale for such advice? If it is sound, why does forecast::auto.arima and other forecasting routines do not follow it? Yet to be implemented? It has already been discussed here that to look for models that just happened to give the minimum AIC is probably not a good idea. Why is the option to have $n\ge1$ ARIMA models with low but approximately equal (e.g. within 1 or 2 values of the minimal AIC) is not a default in much of the time series forecasting software?

(1) Chatfield, C. (1991). Avoiding statistical pitfalls. Statistical Science, 6(3), 240–252. Available online, URL: https://projecteuclid.org/euclid.ss/1177011686.

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
Hibernating
  • 3,723
  • 2
  • 21
  • 34
  • @Gleb_b "An AIC is only high or low in comparison to another." - How can we think differently when we talk about model selection? We always look at low values rather than higher values. What's wrong with my forth sentence? I think it states rather clearly that we are talking about differences (e.g. within one or two of the minimum AIC). There is no mention of absolute values of AIC in the question. – Hibernating Jan 09 '14 at 01:56
  • I removed "low" from the title and first sentence of the box, for the reasons that @hibernating pointed out. – Harvey Motulsky Mar 15 '14 at 13:04
  • @Harvey Motulsky Please put "low" back in both places. Thank you. – Hibernating Mar 15 '14 at 15:03
  • Your question is about how to compare models with the AICs have values fairly close together. High or low is irrelevant (and can change simply by changing the units the data are expressed in). So why put those words back? They are misleading. – Harvey Motulsky Mar 15 '14 at 15:54
  • @Harvey Motulsky First, the box is the quote and as such it should not be modified. Second, I know what to do in situations like AICc=c(234.2, 677.2,677.1) - my question was directed at AICc=c(234.2, 234.1, 677.1). Third, I pointed out no reasons that would prompt your edits or suggest that Chatfield might be misleading. So please put it back. – Hibernating Mar 15 '14 at 16:08
  • But "low" doesn't matter. You'd interpret AICc=(c1000234.2, 1000234.1, 1000677.1) the same way. It is the difference, not the ratio, of AICc values that matters. (I did revert the quote). – Harvey Motulsky Mar 15 '14 at 16:33
  • @Harvey Motulsky No matter what you put in the list of AIC values, there will always be values of AIC that are low and those that are high. I did not talk about ratios at all. If you reverted the quote on which this question is based, you should also revert the title of the question. You're wasting my time - it is the third time I am telling you that your edits are neither useful nor necessary, and that it is my will that the title of my question is consistent with the quote it is based on. – Hibernating Mar 16 '14 at 01:26
  • Does anyone else want to get involved here about the wording of the question? – Harvey Motulsky Mar 16 '14 at 04:10
  • It was meant to be a simple question, not a chat room. Maybe I should try again: imagine 3 ARIMA models displaying the following values of the AIC: 234.2,677.2,677.1. It is just plain silly to ask what to do with approximately equal values of AIC here ---everyone would ignore them and the corresponding models altogether. My question is not about "how to compare models with the AICs have values fairly close together" as you think (typos in the original). – Hibernating Mar 16 '14 at 07:36
  • My question is about the values of AICs that are both low (in comparison to other values in the list) and approximately equal, i.e. it deliberately exludes situations in which it is silly to ask what to do. The answer by Chatfield differs from what I learned from Akaike so I thought I would ask it here if there are other interesting options. This is what I wanted people to get involved about, not about the unimportant issue of which wording may best suit everyone. – Hibernating Mar 16 '14 at 07:37
  • @HarveyMotulsky I think the quote must stand as it is in the original, whether it makes sense that way or not - it's a quote. Ultimately the OP must be the arbiter of what their question says, even if it goes against the best available advice; if the OP is determined to have the question a particular way no matter what, I suggest allowing them to do so - if you think the question makes no sense that way, recommend closure (for example, under "unclear what you're asking"). If you're around, I'll be in chat for a few minutes. – Glen_b Mar 16 '14 at 09:26
  • @Hibernating. Thanks for clarifying. Now I understand. Would this wording for the question express your intent: "Choosing from a set of models when the two lowest AIC values are nearly equal" – Harvey Motulsky Mar 16 '14 at 13:29
  • 1
    @Harvey Motulsky Please let me be myself. I like the current and my original title "What do I do when values of AIC are low and approximately equal?" I tend to prefer "select" over "choose" in my statistical writing. I have a number of other preferences that characterize me as an individual and they are reflected in the way I form my questions and answers. I am glad you finally understood why I asked you to revert the changes. No problems. – Hibernating Mar 17 '14 at 03:11

1 Answers1

2

It's true that if you have multiple AIC values approximately equal selecting the lowest value may be not the best option. A sensible alternative would be performing model averaging. This way you are able to use not just the best model for inference, but a set of "most supported" models each one weighted according to their AIC value.

You have a short introduction by Vincent Calcagno here

Aghila
  • 657
  • 5
  • 18