Under what circumstances is an MA process or AR process appropriate?

Question

I understand that if a process depends on previous values of itself, then it is an AR process. If it depends on previous errors, then it is an MA process.

When would one of either of these two situations occur? Does anyone have a solid example that illuminates the underlying issue regarding what it means for a process to be best modeled as MA vs AR?

It's not as simple a dichotomy as that; after all, an AR can be written as an infinite MA and an (invertible) MA can be written as an infinite AR, so if either is ever appropriate, arguably so is the other. — Glen_b, Jul 14 '14 at 06:13
Glen_b, can you elaborate on this? I understand it's not a simple dichotomy...am I correct to assume (hope, even) that there is something worth uncovering here? I don't want to simply run acf / pacf and pretend I have a good grasp on this process. — tumultous_rooster, Jul 14 '14 at 09:23
Very much related: [Real-life examples of moving average processes](https://stats.stackexchange.com/q/45026/1352) — Stephan Kolassa, Mar 15 '18 at 07:54

Glen_b · Answer 1 · 2018-10-03T00:51:58.380

One important and useful result is the Wold representation theorem (sometimes called the Wold decomposition), which says that every covariance-stationary time series $Y_{t}$ can be written as the sum of two time series, one deterministic and one stochastic.

$Y_t=\mu_t+\sum_{j=0}^\infty b_j \varepsilon_{t-j}\,$, where $\mu_t$ is deterministic.

The second term is an infinite MA.

(It's also the case that an invertible MA can be written as an infinite AR process.)

This suggests that if the series is covariance-stationary, and if we assume you can identify the deterministic part, then you can always write the stochastic part as an MA process. Similarly if the MA satisfies the invertibility condition you can always write it as an AR process.

If you have the process written in one form you can often convert it to the other form.

So in one sense at least, for covariance stationary series, often either AR or MA will be appropriate.

Of course, in practice we would rather not have very large models. If you have a finite AR or MA, both the ACF and PACF eventually decay away geometrically (there's a geometric function that the absolute value of either function will sit below), which will tend to mean that a good approximation of either an AR or an MA in the other form may often be reasonably short.

So under the covariance stationary condition and assuming we can identify the deterministic and stochastic components, often both AR and MA may be appropriate.

Box and Jenkins methodology looks for a parsimonious model -- an AR, MA or ARMA model with few parameters. Typically the ACF and PACF are used to try to identify a model, by transforming to stationarity (perhaps by differencing), identifying a model from the appearance of the ACF and PACF (sometimes people use other tools), fitting the model and then examining the structure of the residuals (typically via the ACF and PACF on the residuals) until the residual series appears reasonably consistent with white noise. Often there will be multiple models that can provide a reasonable approximation to a series. (In practice other criteria are often considered.)

There are some grounds for criticism of this approach. For one example, the p-values that result from such an iterative process don't generally take account of the way the model was arrived at (by looking at the data); this issue might be at least partly avoided by sample splitting, for example. A second example criticism is the difficulty of actually obtaining a stationary series - while one may in many cases transform to obtain a series that seems reasonably consistent with stationarity, it's not usually going to be the case that it really is (similar issues are a common problem with statistical models, though perhaps it may sometimes be more of an issue here).

[The relationship between an AR and the corresponding infinite MA is discussed in Hyndman and Athanasopoulos' Forecasting: principles and practice, here]

-1 because, while it's sort of interesting, it doesn't really answer the spirit of the question. — Jake Westfall, Jul 14 '18 at 19:04
Hi Jake -- thanks for adding the comment about what you think is wrong with the answer. This is much more helpful than a downvote alone would be. I agree there's something lacking here - at the very least it should be made clear why I thought that's relevant enough to post as an answer. — Glen_b, Jul 14 '18 at 20:04
@jake I have made some edits which I hope make the connection to the question clearer. Thanks again for your help — Glen_b, Jul 14 '18 at 20:36
Thanks for the edits. I removed my downvote. My one-sentence condensation of your revised answer would be something like: "This is a difficult question to answer in general because, in a lot of cases, either an AR or MA model could fit the data just about as well as the other." Which is a legitimate, if disappointing, response to the question. — Jake Westfall, Jul 14 '18 at 22:47
@Jake don't feel the need to remove the downvote if you still have reservations; I appreciate the chance to improve the answer either way. — Glen_b, Jul 14 '18 at 22:57

score 8 · Answer 2 · answered Feb 20 '17 at 21:34

I can provide what I think is a compelling answer to the first part of the question ("whence MA?") but am presently pondering an equally compelling answer to the second part of the question ("whence AR?").

Consider a series consisting of the closing price (adjusted for splits and dividends) of a stock on consecutive days. Each day's closing price is derived from a trend (e.g., linear in time) plus the weighted effects of the daily shocks from prior days. Presumably, the effect of the shock at day t-1 will have a stronger influence on the price at day t than will the shock at day t-2, etc. Thus, logically, the stock's closing price at day t will reflect the trend value on day t plus a constant (less than 1) times the weighted sum of the shocks up through day t-1 (i.e., the error term at day t-1)(MA1), possibly plus a constant (less than 1) times the weighted sum of the shocks up through day t-2 (i.e., the error term at day t-2)(MA2), ..., plus the novel shock at day t (white noise). This kind of model seems appropriate for modelling series like the stock market, where the error term at day t represents the weighted sum of prior and current shocks, and defines an MA process. I am working through an equally compelling rationale for an exclusively-AR process.

+1. This is the only answer so far that even attempts to answer the original question. — Stephan Kolassa, Mar 15 '18 at 07:54
If I understand it right, it sounds like the AR process is better for correcting for recurring trends, but MA is better for correcting for large, non-recurring shocks. — Mike Campbell, Jul 15 '19 at 16:36

power · Answer 3 · 2018-03-15T06:58:02.383

So you have a univariate time series and you want model it/forecast it, right? You have chosen to use an ARIMA type model.

The parameters of the depend on what's best for your dataset. But how do you find out? A recent approach is "Automatic time series forecasting" by Hyndman & Khandakar (2008) (pdf).

The algorithm tries different versions of p, q, P and Q and chooses the one with the smallest AIC, AICc or BIC. It is implemented in the auto.arima() function of the forecast R package. The choice of information criterion depends on the which parameters you pass to the function.

For a linear model, choosing a model with the smallest AIC can equivalent to leave-one-out cross-validation.

You should also make sure that you have enough data, at least four years.

Some important checks:

Does the model make sense? For example, if you have monthly retails sales, you will probably expect a seasonal model to be fit.
How well does it forecast out of sample?

Explicit answer to Firebug's comment below: When your data supports it.

This answer doesn't answer the question at all: `"My question is, when would one of either of these two situations occur? "` — Firebug, Nov 15 '17 at 12:43
"Explicit answer to Firebug's comment below: When your data supports it." I agree with @Firebug - this is not an answer to the question and definitely not a solid example for differentiating between the two... — Thomas, Mar 26 '18 at 13:00

Under what circumstances is an MA process or AR process appropriate?

3 Answers3

Linked