AIC values for auto.arima

Question

I have a problem with identifying why auto.arima suggest specific coefficients. I have time series with multiple seasonalities and I am trying to forecast future values using STL+ARIMA. I have been following Hyndman's book (https://otexts.org/fpp2/complexseasonality.html). However, in all examples I've come across I see auto.arima function being used instead of specifying Arima parameters.

I've basically used mstl() function to get the remainders from Loess and then tried to estimate ARIMA parameters for these remainders. I am not sure if what I am doing is correct. I just wanted to check what auto.arima was suggesting.

auto.arima was suggesting (0,1,2) parameters:AIC= 6759.638 SSE= 4774186 p-VALUE= 0.616866 . However, majority of other models had smaller AIC values, like arima(8,0,7):AIC= 6691.219 SSE= 3820932 p-VALUE= 0.9947854. or arima(3,1,7):AIC= 6709.794 SSE= 4177168 p-VALUE= 0.9993566 .

I don't understand if what I am doing is correct and why this happens. I have checked residuals as well (Residuals from STL + ARIMA(0,1,2) Q* = 39.068, df = 18, p-value = 0.00279 Model df: 2. Total lags used: 20). ( Residuals from STL + ARIMA(3,1,7 Q* = 5.5735, df = 10, p-value = 0.8497 Model df: 10.Total lags used: 20)Any suggestions how to approach this situation ? code is below

data.hourly.msts=msts(ts.data,seasonal.periods = 
c(24,168),start=c(1,1))
fixed.nValid=48
fixed.nTrain=length(ts.data)-fixed.nValid

datatrain.msts=window(data.hourly.msts,start=c(1,1),end=c(1,fixed.nTrain))

a <-  mstl(datatrain.msts)

b<-a[,5]


for(p in 1:10){
for(q in 1:10){
for(d in 1:2){
  if(p+d+q<=20){
    model<-arima(x=b, order = c((p-1),(d-1),(q-1)),optim.control = list(maxit = 2500),method="ML")
    pval<-Box.test(model$residuals, lag=log(length(model$residuals)))
    sse<-sum(model$residuals^2)
cat(p-1,d-1,q-1, 'AIC=', model$aic, ' SSE=',sse,' p-VALUE=', pval$p.value,'\n')
   }
  }
 }
}


data.hourly.stlm=stlm(data.hourly.msts,s.window="periodic", 
modelfunction=Arima,order=c(8,0,7))
data.hourly.stlm.pred=forecast(data.hourly.stlm,h=48)

score 2 · Answer 1 · answered Jan 10 '19 at 08:23

There are two default parameters to auto.arima() that are relevant here.

First, auto.arima() by default limits the orders of AR and MA coefficients, as well as the amount of differencing. Here are the relevant parameters with their default values: max.p=5, max.q=5, max.d=2. There are very good reasons for this default behavior: Why does default auto.arima stop at (5,2,5)? and Order of ARMA models.
Second, even if you were to specify higher maximum orders, auto.arima() might not find the AIC-minimal model, because it does by default not search all possible models, but proceeds in a greedy stepwise fashion (stepwise=TRUE).

You can of course both increase the maximum orders and set stepwise=FALSE. Be prepared for a long model selection and fitting process, especially if you have a long time series. (Or multiple series.)

In any case, as per above, there are good reasons not to spend too much time on higher ARIMA orders. The authors of the forecast package know what they are doing, and higher orders may well yield lower AICs, but they will typically not lead to smaller out-of-sample forecast errors. Have you assessed your more complicated models on holdout samples? I would strongly recommend you do so.

Thanks for the answer. Very helpful. I actually have compared auto.arima models and ARIMA with higher orders over a variety of holdout samples. Higher order ARIMA tend to perform better in every case. It's just everywhere I read it's always the same: don't use higher order ARIMA. Choose lower orders instead. Thats why its so confusing for me as I feel I am missing something important. — Alex Stepanov, Jan 10 '19 at 08:35
That is interesting. Have you tried custom methods for [tag:multiple-seasonalities], like BATS or TBATS? — Stephan Kolassa, Jan 10 '19 at 08:40
Yes. I have tried TBATS. Out of 30 test I have made it shows roughly similar MAPE results as higher order STL+ARIMA models. e.g 2.3% and 2.4%. I have also tried ARIMA with Fourier terms. It tends to perform worse. e.g 4-5% — Alex Stepanov, Jan 10 '19 at 08:48

AIC values for auto.arima

1 Answers1