6

Is there in general a 'most appropriate' method to perform model selection in a time series context for forecasting purposes?

One can find in the literature a jungle of different information criteria (IC). To name a few: AIC, AICc, AICu, MAIC, TIC, FPE, FPEu, Mallow's Cp, SIC, HQ, HQc, EIC... Is anyone aware of a comprehensive review (or any good reason) to rank the performance of these criteria (either theoretically or empirically or just even qualitatively) in the context of time series model selection?

Moreover as far as I understand IC can be used to perform hyper parameter tuning/variable selection (example I can use AIC to find the best $p^*$ and $q^*$ for an ARMA$(p,q)$ model but I cannot compare the AIC values of an ARIMA$(1,1,1)$ with that of an ARIMA$(1,2,1)$ because of the order of differencing nor can I compare an ARMA with a ETS). Is it not incorrect than to refer to IC as to methods to perform model selection rather than variable selection? Am I then left only with rolling forecasting origin CV (or other modified CV) to actually do model selection in the strict sense?

Lastly if that is the case, it seems like IC and LASSO methods are in direct competition? How does the performance of the best IC compares to that of LASSO? And more specifically, to 'which' LASSO? I would in fact assume that it makes a significant difference the way in which we select the regularisation parameter $\lambda$: would it not be -for instance- redundant to select $\lambda$ through an IC (say AIC)?

An answer to any of the previous question or references would be greatly appreciated.

semola
  • 111
  • 6
  • 1
    This seems rather broad to me. It may be too broad to really be answerable here. You may want to edit this to ask a narrower, concrete question. You can always ask follow-up questions, & link back, as you learn more. – gung - Reinstate Monica Jan 18 '17 at 00:54
  • Leaving aside all the details there is just one very specific question: is there a 'best' method for model selection in a time series context? I don't really agree when you say it's not answerable. I think an answer to this could even be 'at the moment there is no consensus over the community', or 'you can find your answer in article x,y, and z'. The details are there because even a partial answer to any of those points would be in a way satisfactory. – semola Jan 18 '17 at 02:09
  • 6
    If there were a single 'best' method, why would anyone have ever heard of any others, unless they do academic research on it as a curiosity. If the whole answer is 'No.', then we consider this to be not a good question for the site. On the other hand, if the proper answer is a / several books on how to assess the different particularities of your situation to make a defensible selection, then this is too broad. The SE system is not designed as a discussion forum or a place for "partial answers". – gung - Reinstate Monica Jan 18 '17 at 02:20
  • In this sense I find perfectly reasonable to ask the question 'is there a study considering a bit more than 3 methods? something more comprehensive? something not discussing only ICs but including other methods as well?' – semola Jan 18 '17 at 02:31
  • On top of that, I am not an expert on the regulations of StackExchange, but as far as I am concerned an answer to the question 'is there a consensus over X' is of great interest to a scientific community. It is up to those who answer to write a meaningful 'No', and how meaningful it is will depend certainly on their knowledge of the topic. But you certainly have a higher reputation than I have within SE, and you certainly know regulations better. – semola Jan 18 '17 at 02:39
  • My guess is that this will be closed as too broad, but maybe it won't. Good luck with your question. – gung - Reinstate Monica Jan 18 '17 at 02:45
  • There exist studies (empirically) showing for instance that EIC performs better than both AIC, BIC ( Baki Billah and Rob J. Hyndman and Anne B. Koehler ,Empirical information criteria for time series forecasting model selection) and OOS-CV, as mush as AIC is sometimes said to be more appropriate than BIC in a forecasting context . – semola Jan 18 '17 at 02:50
  • 2
    Note that the time series dimension here is relatively unimportant. You could ask the same question about cross sectional data, and the answers would be similar. – Richard Hardy Jan 21 '17 at 14:40
  • 1
    To see how one could go about comparing different criteria and selection methods in a rigorous way and for some results on information criteria, cross validation and other methods, I recommend this paper: http://www.econ2.jhu.edu/People/Wright/hw.pdf Other than that, I agree with @gung that the question is too broad. – Matthias Schmidtblaicher Jan 24 '17 at 21:12

1 Answers1

2

If the focus is on time-series and forecasting, then I would only consider rolling CV. When working with time-series it is critical to exclude any innovative (unknown) process from the fit.

ICs estimate variance by penalizing the model fit (through degrees of freedom or other variables). These formulas were designed when computing power and data were limited and an analytical solution was more efficient.

Robert Kubrick
  • 4,078
  • 8
  • 38
  • 55