What are the downsides of ARIMA models?

Question

I recently worked with ARIMA models and realized how common they are in economics and other sciences. You can find dozens of articles on the Internet that use ARIMA models, but you can find relatively little about the disadvantages of ARIMA models.

What are the main weaknesses of ARIMA models, both methodologically and statistically? Does anyone have good scientific articles that deal with this topic?

I wonder how you would apply ARIMA to time series of small counts. How broad is your definition of "ARIMA"? — whuber, Dec 24 '21 at 23:08

score 9 · Accepted Answer · answered Dec 24 '21 at 16:02

One key downside is that ARIMA models tend not to forecast very well. (I'm sure I will get my share of pushback for that statement. And yes, it is too broad in a sense, but it serves as - I believe - a useful first-order approximation.)

This came as something of a surprise at the earlier forecasting competition, at least to the statisticians who had gone through (or written originally) all the proofs about optimality and similar properties of ARIMA models - though only under the assumption that the true data generating process in fact does follow an ARIMA process, and usually also using asymptotic results.

More information at my answer to Why is non-iid noise so important to traditional time-series approaches?, where I also linked to Rob Hyndman's "Brief history of forecasting competitions" (2020, IJF), which is extremely enlightening reading.

ColorStatistics · Answer 2 · 2021-12-28T21:26:06.283

In my answer, I respectfully disagree with the accepted answer.

First of all, the fact that ARIMA models do not forecast well in forecasting competitions is not a weakness of ARIMA but is evidence that the stochastic process that produced the time series in question was one other than ARIMA and ARIMA should not have been used in the first place. A time series with nonlinear dependence for example will obviously not be forecasted well by ARIMA but that is hardly a shortcoming of ARIMA. If you simulate a time series from an ARIMA process, then an ARIMA model will do a spectacular job at prediction. If using the wrong tool results in poor performance - this is not evidence that the tool if flawed.

Secondly, if ARIMA models are to be faulted for their performance in forecasting then in my opinion a strong case can be made about them doing poorly in long-term forecasting only. They could also be faulted for assuming that the error terms is white noise with a constant variance, which translates into a constant prediction error. But the GARCH family of models can help accommodate time-varying, autocorrelated variance.

Another point pertains to the fact that it is trendy to discard old methodologies in favor of new, recently minted methods. In the forecasting competitions the message seems to be that we should discard ARIMA for machine learning methods. But

there have been plenty of fads in statistics/econometrics. Check out this article titled "Economists are prone to fads, and the latest is machine learning". Saying the words "machine learning" in an interview these days might help you get the job, but that is hardly evidence of substance
I could organize a forecasting competition where I will select the time series so as to champion any particular family of models - you name it; I could select these time series so that the ARIMA models will do best and the machine learning methods will do poorly

If, as whuber points out, the case could be made that there are hardly any real-world phenomena that are driven by ARIMA processes, then a case could be made about the limited applicability of ARIMA. And to me the biggest finding from the forecasting competitions is that combinations of various methods have consistently outperformed (across competitions), on average, any particular method. This seems to support the statement that real time series come from much more complicated processes than those in our arsenal of models (which includes both ARIMA and machine learning models). However, this is an indictment against any individual model type - be it an ARIMA model or a machine learning model. But somehow the conclusion is erroneously translated into machine learning - good; ARIMA - bad.

I would find the assertion "ARIMA should not have been used in the first place" more convincing if it were clear there are adequately powerful diagnostic procedures to detect when ARIMA is inappropriate or could produce misleading forecasts and if there were many real-world time series processes driven by a pure ARIMA model. As Rob Hyndman says in his article, "real data comes from much more complicated, non-linear, non-stationary processes than any model we might dream up." — whuber, Dec 28 '21 at 18:12
Don't all the classic textbooks on time series abound with examples in which ARIMA do a great job modeling real-world phenomena? It surely won't do a great job forecasting everything out there - ex. nonlinear dependence. On the "powerful diagnostic procedures", isn't this comment applicable to all model types? Or are there "powerful diagnostic procedures" for other model-types that we don't have for ARIMA? — ColorStatistics, Dec 28 '21 at 18:24
Rob Hyndman's article (written just three years ago) is well worth reading in that regard: it documents 40+ years of competitions in which ARIMA methods featured prominently--and, initially to almost everyone's surprise, were outperformed by "simpler methods." — whuber, Dec 28 '21 at 18:53
I do agree with your answer, +1. The fact that a screwdriver will fare poorly in getting a nail into a wall is not a weakness of the screwdriver. I wrote my answer in the context that *every single forecasting textbook* apparently feels obligated to include ARIMA - often without anything on the statistical background. To the reader of these textbooks, it may well come as a surprise that ARIMA does not work quite so well on real data. (Also, I do not think that forecasting competition data were handpicked to discredit ARIMA.) — Stephan Kolassa, Dec 28 '21 at 19:01
@whuber: thank you for the reference. I will read Hyndman's article — ColorStatistics, Dec 28 '21 at 19:48
@StephanKolassa: I do agree with the sentiment that the applicability of ARIMA is, at times, not properly qualified. I do not want to claim that the forecasting competitions cherry-picked the series to discredit ARIMA; I know little about how those series were picked, but I do want to raise the point that a certain selection of series was made and inevitably this selection of series may select one winner but another selection may select another winner. — ColorStatistics, Dec 28 '21 at 19:52
@ColorStatistics: you are of course right in that the sample of time series determines, to a degree, the winner of competitions. You might be interested in the forthcoming commentary by [Ma & Fildes](https://www.researchgate.net/publication/354603148_The_performance_of_the_global_bottom-up_approach_in_the_M5_accuracy_competition_a_robustness_check) on the M5 competition. They assessed the robustness of the leaderboard with respect to the valuation time, but an analogous analysis could certainly be done with respect to series. Also, Smyl's M4 winner did not perform very well in the M5. — Stephan Kolassa, Dec 28 '21 at 21:08

What are the downsides of ARIMA models?

2 Answers2