1

I am currently running an iterative process(for loop) to determine best ARIMA model for monthly sales data according to smallest AIC and MAPE. Box-Jenkins methodology clearly states to choose the order of the ARIMA components according to smallest AIC and validate according to out of sample accuracy measures, such as MAPE.

My question is:

Is over-fitting a risk if I choose my models according to smallest MAPE instead of AIC?Are there any risks/disadvantages?

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
  • 2
    *Box-Jenkins methodology clearly states to choose the order of the ARIMA components according to smallest AIC* does not seem to be a common interpretation of the Box-Jenkins methodology. Determining lag orders by ACF and PACF is a more common interpretation. – Richard Hardy Jan 23 '20 at 19:08
  • You're right, thank you for the notice.The school subject were I learned about the methodology had mentioned the terms multiple times together. I guess the AIC came later in the method. – Andrei Catana Jan 24 '20 at 09:45

1 Answers1

2

Following @Richard Hardy's excellent comment ...and a paraphrase of @Adamo's advice here Interrupted Time Series Analysis - ARIMAX for High Frequency Biological Data?

"The correlograms i.e. the acf and the pacf should be calculated from residuals using a model that controls for intervention administration, otherwise the intervention effects are taken to be Gaussian noise, underestimating the actual autoregressive effect."

In other words if there is latent deterministic structure the acf and pacf can be quite misleading thus one needs to examine data that is conditioned upon possible pulses, level/step shifts, seasonal pulses and deterministic time trends.

Mindless fitting of all possible models can and often will produce redundancies and/or unnecessary differences and power transformations and over-populated structure. KISS (parsimony) should be the objective . Care should be taken to ensure that parameter constancy is validated over time and that model error variance is not flagrantly heterogeneous over time.

https://autobox.com/pdfs/ARIMA%20FLOW%20CHART.pdf suggests a script/paradigm that can often be useful.

IrishStat
  • 27,906
  • 5
  • 29
  • 55