auto.arima vs arima.sim

Question

I noticed auto.arima is frequently giving a different model than simulated with arima.sim, so I tested it crudely:

results = c(0, 0)
for(i in 1:1000){
  m = auto.arima(
    arima.sim(model = list(order = c(2,0,0), ar = c(0.3, 0.5)), n = 100)
  ) 
  if(!"ma1" %in% names(m$coef)){
    if("ar2" %in% names(m$coef)){
      results[1] = results[1] + 1
    }else{
      results[2] = results[2] + 1
    }
  }else{
    results[2] = results[2] + 1
  }
}
results
[1] 265 735

In this test, auto.arima is only right 36% of the time.

Am I messing up the settings of auto.arima or arima.sim? Or, are the models given by auto.arima equivalent in some way?

--- edit---

I upped the sample size on arima.sim to n=1000, and auto.arima was correct 60% of the time.

So... should I just not use auto.arima unless I have a ton of samples, or are models determined by auto.arima, although different than simulated, equivalent in some way?

`arima.sim` only gives you a single realisation from the model so there is no way that `auto.arima` can know with 100% certainty the true model. — Jarle Tufto, Apr 07 '20 at 19:47
But is 36% accurate a little low for a random sample of ARIMA(2, 0, 0) simulations? — Frank, Apr 07 '20 at 19:49
upped the sample size to 1000, now its 60.6% accurate. Seems like I need a very big sample size to get decent results. — Frank, Apr 07 '20 at 19:59
Whether or not the results are decent depends upon more than just whether it got the model orders right - it also depends on the parameter estimates. Two models may have different orders, but they will also have different parameter estimates, and may therefore generate predictions that are a lot more similar than the binary "did auto.arima match the true model order exactly?" question can indicate. — jbowman, Apr 07 '20 at 22:58
How often would say, a 5 step ahead forecast with an ARIMA(0, 1, 1) be very similar to one with an ARIMA(1, 1, 0), if fit on the same data? What about a 10 step ahead? — Frank, Apr 07 '20 at 23:23
Should you not use `auto.arima`? Good luck finding a better algorithm! @jbowman is also right. — Richard Hardy, Apr 08 '20 at 06:17

score 4 · Accepted Answer · answered Apr 08 '20 at 09:35

No, you did not make an error in your code (that I can see).

It is unfortunately the case that automatic ARIMA model selection rarely recovers the model that was used to simulate the input data. This is simply a consequence of there usually being very little data compared to the number of parameters to be estimated. After all, we can't just count the two AR parameters the true data generating process used - auto.arima() searches through many different possible models. And if seasonality might be an issue, the number of possible models increases yet further.

I have long had a nagging suspicion that the main reason why ARIMA is popular is not because it does a great job in modeling real time series, but because it is a time series model that you can actually prove mathematical theorems about. (Exponential smoothing, as an alternative, used to be mainly a heuristic, and was only put on solid mathematical grounds via state space models about 15 years ago.) This suspicion of mine is reinforced by the description of how the original forecasting competition by Makridakis - where ARIMA method already performed poorly - was received by statisticians (Hyndman, 2020, IJF - very much recommended):

Many of the discussants seem to have been enamoured with ARIMA models. ... At times, the discussion degenerated to questioning the competency of the authors.

Also, it seems like there are few truly convincing real-life examples of moving average processes. If you restricted your search to AR(p) models only, auto.arima() might be better at finding the true model. And even then, the parameters would only be estimated and differ from the true values, of course.

Note that I do not think this is a shortcoming of the auto.arima() implementation. Quite to the contrary. There are few approaches to fitting ARIMA models that I would trust more. Rob Hyndman is probably the one person who knows most about this. For instance, I would not assume that a Box-Jenkins approach would be better.

In any case, what you did is a great exercise to understand the shortcomings of ARIMA fitting. I wish many more time series analysis and forecasting courses included such a short lesson in humility. I would encourage you to play around a little more with your approach, maybe include seasonal models, or other AR or even MA parameters, or allow or disallow the Box-Cox transformation.

All that said, why do you want to fit an ARIMA model? Most people will want to do so to forecast. (Yes, there are also other motivations.) In this case, it would make sense to expand your experiments to also include the forecasting step. This is what I do below. I simulated 100 (or 1000) historical data points plus 50 holdout data points using arima.sim(), then fitted an ARIMA model using auto.arima(), calculated 90% interval forecasts using forecast(), counted how often these interval forecasts covered the true value and finally plotted the average coverage against the forecast horizon. Results and code are below. It would seem to me that at least the interval forecasts are "good enough". (The apparent upward trend in the plot for a history length of 100 looks intriguing. I don't know where it comes from.)

library(forecast)
n_sims <- 1000
n_history <- 100
n_forecast <- 50
true_model <- list(ar=c(0.3,0.5))
confidence_level <- 0.9
hit <- matrix(NA,nrow=n_sims,ncol=n_forecast)

pb <- winProgressBar(max=n_sims)
for ( ii in 1:n_sims ) {
    setWinProgressBar(pb,ii,paste(ii,"of",n_sims))
    set.seed(ii)    # for reproducibility
    actuals <- arima.sim(model=true_model,n=n_history+n_forecast)
    model <- auto.arima(actuals[1:n_history])
    fcst <- forecast(model,h=n_forecast,level=confidence_level)

    hit[ii,] <- fcst$lower<=tail(actuals,n_forecast) & tail(actuals,n_forecast)<=fcst$upper
}
close(pb)

coverage <- apply(hit,2,sum)/n_sims
plot(coverage,xlab="Forecast horizon",ylab="Coverage",las=1,
    main=paste("History length:",n_history),
    type="o",pch=19,ylim=range(c(coverage,confidence_level)))
abline(h=confidence_level,lwd=2,col="red")

An alternative you might want to explore would be to compare the expectation point forecast against various possible "optimal" cases, e.g.:

against the point forecast if the true model is known (including the parameters)
against the point forecast if the true model and the Box-Cox transformation is known, but the parameters need to be estimated
against the point forecast if the true model is known, but the Box-Cox transformation and the parameters need to be estimated
against the point forecast if auto.arima() can only choose among AR(p) models for $p=0, \dots, 5$ or so
...

I appreciate the answer. I'm wondering what the percent difference is between the upper or lower bounds and the point forecast. That second experiment / bullet point is also very interesting — Frank, Apr 08 '20 at 14:58
I would be careful about looking at percentage differences, since the point forecast for an ARIMA model without a mean can easily be close to zero, or even negative. — Stephan Kolassa, Apr 08 '20 at 14:59

auto.arima vs arima.sim

1 Answers1

Linked