Should I avoid mixed ARMA models?

Question

I have hourly demand data for taxi rides that spans several years into the past. I want to use it in order to forecast future demand (for the next day). Robert Nau warns against the usage of a mixed ARMA model

you should generally avoid using both AR and MA terms in the same nonseasonal ARIMA model: they may end up working against each other and merely canceling each other’s effects.

Not sure I understand why are they canceling each-other - can you explain the mathematical intuition?

Also, I saw that Hyndman isn't paying attention to Nau's advice when dealing with demand data (much like my data), and simply uses auto.arima and searches for the best model (the one that's minimizing the AICc).

I think that the source of my confusion is that I don't understand in what circumstances AR and MA processes are cancelling each other, and when should we avoid them. Is this a manifestation of a multicollinearity problem? or is it something else I should worry about?

That is an... *interesting*... statement. I have never seen advice like this in my 13+ years of time series forecasting (admittedly, I'm not an expert on ARIMA), nor any evidence or theoretical argument that would support it. I'd be very interested in an answer. — Stephan Kolassa, Sep 27 '19 at 16:07
@StephanKolassa, Nau and his duke notes appear very high on any google search I do for time series forecasting... — ihadanny, Sep 27 '19 at 16:20
Yes, I know the name, though I have never come across him. He is not just some nobody. Which is why this statement surprises me a bit. — Stephan Kolassa, Sep 27 '19 at 16:23
There are some potential issues in for example an ARMA(1,1) where the likelihood will be constant in the subspace where $\phi = \theta$ (or $\phi = -\theta$ depending on the parametrization) because those terms will "cancel" and collapse to ARMA(0,0). Is that what you're asking about? I don't think that this precludes using mixed AR and MA models but it is something you should be aware of, I guess. — Chris Haug, Sep 27 '19 at 17:14
Agreeing with Chris Haug, I will add that this is specifically avoided in `auto.arima` which uses `Arima` for estimation that checks for such (approximate and exact) cancellations and rules such models out. And since ARMA is more parsimonious than pure AR or pure MA, the advice sounds weird. — Richard Hardy, Sep 27 '19 at 17:30
no need to avoid mixed models ...just identify correctly in an iterative way not in a try-all way which often leads to over-modelling incorporating self-cancelling features....leading to inflated forecast error variance — IrishStat, Oct 05 '19 at 14:35

kjetil b halvorsen · Answer 1 · 2019-09-30T14:48:35.060

1

This is a comment, but too long. I looked at the cited paper by Robert Nau, and here is actual citations: (page 6 of pdf)

You should try to avoid using “mixed” models in which there are both AR and MA coefficients, except in very special cases.

with this footnote:

An exception to this is that If you are working with data from physics or engineering applications, you may encounter mixed ARIMA(p, 0, p-1) models for values of p that are 2 or larger. This model describes the discrete-time behavior of a system that is governed by a p-order linear differential equation, if that means anything to you. For example, the motion of a mass on a spring that is subjected to normally distributed random shocks is described by an ARIMA(2, 0, 1) model if it is observed in discrete time. If two such systems are coupled together, you would get an ARIMA(4, 0, 3) model.

Also, among his list of typical models, he includes one model breaking this advice

ARIMA(1, 1, 2) = linear exponential smoothing with damped trend (leveling off)

showing the advice is meant to be tentative. The paper is an instructional one aimed for business students, and much advice is modified by ... for a business application.

Lot of other interesting advice, one example cite: (page 20 of pdf)

If you apply one or more first-difference transformations, the autocorrelations are reduced and eventually become negative, and the signature changes from an AR signature to an MA signature. An AR signature is often the signature of a series that is “slightly underdifferenced,” while an MA signature is often the signature of a series that is “slightly overdifferenced.” If you apply one difference too many, you will get a very strong pattern of negative autocorrelation.

edited Sep 30 '19 at 14:48

answered Sep 27 '19 at 20:37

kjetil b halvorsen

63,378
26
142
467

1

I have never heard of such advice. but I have heard of keeping the sum (p + q) to some low number because the greater the sum, the greater the chance of overfitting. recall that an AR(1) is an MA($\infty$), so if $p = 1$, you've covered some not small part of the parameter space. – mlofton Sep 27 '19 at 20:52
1

Nau's advice ( in this case ) is dead on correct. – IrishStat Oct 05 '19 at 14:34
@IrishStat: Could you please expand on that, i.e. giving some reasons why it is true? Theorems? – kjetil b halvorsen Oct 05 '19 at 14:36
the simple explanation is that if the process is white noise and you incorrectly difference it the resulting transformed series is a (0,0,1) with ma coefficient of 1.0 . Also note that if you take the original white noise series and form a (1,0,1) model there are an infinite # of solutions that can arise . each of which will have two coefficients of the same value but with different signs .QED – IrishStat Oct 05 '19 at 14:43
1

Could you expand that to a formal answer, please? – kjetil b halvorsen Oct 05 '19 at 14:45
my comments and @Wayne's comments here https://stats.stackexchange.com/questions/15519/what-more-does-differencing-d0-do-in-arima-than-detrend suggests the Slutsky effect or the Slutsky theorem where unwarranted filtering results in a variance inflated series . "Differencing a process without a unit root, but with a trend, can actually produce bad results (the new, differenced error term can have a strange distribution that contains autocorrelation", – IrishStat Oct 05 '19 at 14:59
https://stats.stackexchange.com/questions/428694/arma-model-with-ar-and-ma-coefficients-having-same-magnitude-but-opposite- is also relevant – IrishStat Oct 05 '19 at 15:18
Not only that specific example.. [Granger (1980)](https://www.utsc.utoronto.ca/~sdamouras/courses/STAD57H3_W13/Lecture%2019/Long%20memory%20relationships%20and%20the%20aggregation%20of%20dynamic%20models%20(Granger)%20-%20Copy.pdf) showed that aggregation of certain AR(1) series will lead to ARMA(N,N-1) process (unless cancelation of roots occur), which would include plenty of macroeconomic series. – runr Feb 25 '21 at 06:39

Should I avoid mixed ARMA models?

1 Answers1

Linked