Do non-invertible MA models imply that the effect of past observations increases with the distance?

Question

Update (2019-06-25): changing title from "Do non-invertible MA models make sense?" to distinguish it from Question 333802.

While reviewing MA($q$) models, I came across these slides (Alonso and Garcia-Martos, 2012). The authors state that, while all MA processes are stationary, if they are not invertible you have

"the paradoxical situation in which the effect of past observations increases with the distance."

This can be seen in by the decomposition of the MA(1) process: $$ y_t = \epsilon_t - \theta \epsilon_{t-1} $$ into $$ y_t = \epsilon_t -\sum_{i=1}^{t-1} \theta ^i y_{t-i} - \theta^t \epsilon_0,$$ where clearly $|\theta|>1$ translates into history having more and more influence over the present. Two things about this bother me:

It's not hard to imagine a situation where there's a one-time period lag in the effects of something
This Cross Validated Post has an answer which claims:

"Invertibility is not really a big deal because almost any Gaussian, non-invertible MA(q) model can be changed to an invertible MA(q) model representing the same process"

Is it true that the effect of past observations increases with the distance? If so, does that make the models unfit for describing real world phenomena?

Update (2019-11-09) Found this in the text Time Series Analysis and Its Applications (Shumway and Stoffer, page 85) which also supports the case that it doesn't really matter if an MA model in non-invertible, but we may want to chose the non-invertible version of the model for convenience.

I think a distinction between $|\theta|=1$ and $|\theta|>1$ may be important. Your text seems to focus on the latter case, but the terminology (*noninvertible*) does not help distinguish between the two. If $|\theta|=1$ is a big deal (is it not?) while $|\theta|>1$ is not, the question is hard to answer when based solely on the term *noninvertible*. Perhaps you could edit the post to highlight this? — Richard Hardy, Jun 25 '19 at 12:09
@whuber, I'd appreciate another look since I changed the title. I'm hoping that by focusing on the property of influence of historical data points, I've carved out a new space. — Ben Ogorek, Jun 25 '19 at 18:07

Ben · Accepted Answer · 2021-01-12T10:04:29.237

Not a big deal - it is strongly stationary and approaches white noise

The non-invertible $\text{MA}(1)$ process makes perfect sense, and it does not exhibit any particularly strange behaviour. Taking the Gaussian version of the process, for any vector $\mathbf{y} = (y_1,...,y_n)$ consisting of consecutive observations, we have $\mathbf{y} \sim \text{N}(\mathbf{0}, \mathbf{\Sigma})$ with covariance:

$$\mathbf{\Sigma} \equiv \frac{\sigma^2}{1+\theta^2} \begin{bmatrix} 1+\theta^2 & -\theta & 0 & \cdots & 0 & 0 & 0 \\ -\theta & 1+\theta^2 & -\theta & \cdots & 0 & 0 & 0 \\ 0 & - \theta & 1+\theta^2 & \cdots & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 1+\theta^2 & -\theta & 0 \\ 0 & 0 & 0 & \cdots & -\theta & 1+\theta^2 & -\theta \\ 0 & 0 & 0 & \cdots & 0 & -\theta & 1+\theta^2 \\ \end{bmatrix}.$$

As you can see, this is a strongly stationary process, and observations that are more than one lag apart are independent, even when $|\theta|>1$. This is unsurprising, in view of the fact that such observations do not share any influence from the underlying white noise process. There does not appear to be any behaviour in which "past observations increases with the distance", and the equation you have stated does not establish this (see below for further discussion).

In fact, as $|\theta| \rightarrow \infty$ (which is the most extreme case of the phenomenon you are considering) the model reduces asymptotically to a trivial white noise process. This is completely unsurprising, in view of the fact that a large coefficient on the first-lagged error term dominates the unit coefficient on the concurrent error term, and shifts the model asymptotically towards the form $y_t \rightarrow \theta \epsilon_{t-1}$, which is just a scaled and shifted version of the underlying white noise process.

A note on your equation: In the equation in your question you write the current value of the observable time series as a geometrically increasing sum of past values, plus the left-over error terms. This is asserted to show that "the effect of past observations increases with the distance". However, the equation involves a large number of cancelling terms. To see this, let's expand out the past observable terms to show the cancelling of terms:

$$\begin{equation} \begin{aligned} y_t &= \epsilon_t - \sum_{i=1}^{t-1} \theta^i y_{t-i} - \theta^t \epsilon_0 \\[6pt] &= \epsilon_t - \sum_{i=1}^{t-1} \theta^i (\epsilon_{t-i} - \theta \epsilon_{t-i-1}) - \theta^t \epsilon_0 \\[6pt] &= \epsilon_t - ( \theta \epsilon_{t-1} - \theta^2 \epsilon_{t-2} ) \\[6pt] &\quad \quad \quad \quad \quad \ \ \ - ( \theta^2 \epsilon_{t-2} - \theta^3 \epsilon_{t-3} ) \\[6pt] &\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad - ( \theta^3 \epsilon_{t-3} - \theta^4 \epsilon_{t-4} ) \\[6pt] &\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \ \ \ - \ \cdots \\[6pt] &\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \ \ \ - ( \theta^{t-1} \epsilon_1 - \theta^t \epsilon_0 ). \\[6pt] \end{aligned} \end{equation}$$

We can see from this expansion that the geometrically increasing sum of past values of the observable time series is there solely to get the previous error term:

$$\epsilon_{t-1} = \sum_{i=1}^{t-1} \theta^{i-1} y_{t-i} + \theta^{t-1} \epsilon_0.$$

All that is happening here is that you are trying to express the previous error term in an awkward way. The fact that a long cancelling sum of geometrically weighted values of the series is equal to the desired error term does not demonstrate that past observations are having "an effect" on the present time-series value. It merely means that if you want to express $\epsilon_{t-1}$ in terms of $\epsilon_0$ then the only way you can do it is to add in the geometrically weighted sum of the observable series.

Hi Ben: I agree with what you did but the reason for the non-inveribility is that, if you re-write as an AR(1), the model response depends more on the data that's further away from the response compared to data that's closer. This is not intuitive for an AR(1). But, in general, from a practical perspective, I agree about MA non-invertibility not being important. thanks. — mlofton, Jun 24 '19 at 14:37
Ben, if you could explain why the second equation in the original post does not mean what I think it does (that the influence of past observations on the *moving* average increases over time), then I'd be satisfied with the answer. Everything else you're saying makes sense. — Ben Ogorek, Jun 24 '19 at 16:13
@Ben Ogorek: I have added an additional section addressing this equation. — Ben, Jun 25 '19 at 08:10
Does your answer apply equally to the cases of $|\theta|=1$ and $|\theta|>1$? I am thinking of *overdifferencing* which yields $\theta=-1$. If I remember correctly, it is considered a rather serious problem (though I cannot recall the exact argument). — Richard Hardy, Jun 25 '19 at 12:11
Yes, these equations are valid for all $\theta \in \mathbb{R}$. The only difference between the cases $|\theta| > 1$ and $|\theta| = 1$ is that the powers of the coefficient do not balloon in size in the latter case. — Ben, Jun 25 '19 at 12:14
Alright, Ben, I'm convinced. I still don't feel 100% great about the previous error term explanation, but I realized you must be right after trying some simple simulations and not seeing anything strange in the dependence structure. By the way, the bounty disappeared into thin air, I think when the question got closed for duplicate status, so I skimmed some of your old answers and made up for it there. — Ben Ogorek, Jun 25 '19 at 21:23
@Ben so you disagree with the last paragraph of this chapter https://otexts.com/fpp2/MA.html of the book "Forecasting: Principles and Practice" by Rob J Hyndman and George Athanasopoulos? — Marco Rudelli, Feb 04 '22 at 16:04
Hi @MarcoRudelli it's been a few years, but I remember being (reluctantly) convinced that Ben was right here, which has to mean that Rob and George are wrong when they say "the more distant the observations the greater their influence on the current error." — Ben Ogorek, Feb 04 '22 at 20:55
@MarcoRudelli: First of all, the $\text{AR}(\infty)$ representation they give in the equation only holds when $\lim_{r \rightarrow \infty}(-\theta)^r y_{t-r} = 0$. If you have $|\theta| > 1$ then this limit condition is only going to hold in rather pathological situations on the original time-series. So I think the entire premise of the situation they're considering is rather strange, and it misses consideration of a mathematical term that would normally appear in the limit (and which they don't mention). — Ben, Feb 05 '22 at 05:53
@Ben thanks (both to you and Ben Ogorek) for still being active here. I feel like it's really strange that many books use this argument of "weights that increase as lag increases" to state that non invertible processes don't make sense or shouldn't be used. — Marco Rudelli, Feb 06 '22 at 04:22
@Ben The book "Time Series Analysis: Forecasting and Control" by Box, Jenkins and Reinsel is another example of this, as I reported in the first paragraph here: https://stats.stackexchange.com/q/557948/318559. Seeing such a famous book use this argument makes me feel like I'm not understanding the issue completely, but my intuition agrees with your answer here and not the books. — Marco Rudelli, Feb 06 '22 at 04:24

Jarle Tufto · Answer 2 · 2019-06-25T11:26:39.727

5

I don't think it makes sense to ask for an example "from the real world where they [non-invertible MA models] occur". All you observe is $y_1,y_2,\dots,y_n$. As I try to explain in the post you link to, the joint distribution of these data can almost always (except in the case were the MA polynomial has one or more unit roots) be identically modelled as generated by either a number of non-invertible MA models or by a corresponding invertible MA model. Based on the data alone, there is therefore no way of knowing if the "real world" underlying mechanism corresponds to that of a non-invertible or invertible model. And ARIMA models are anyhow not intended as mechanistic models of the data-generating process in the first place.

So this just boils down to restricting the parameter space to that of invertible models to make the model identifiable with the added benefit of having a model that is easily put into AR$(\infty)$ form.

edited Jun 25 '19 at 11:26

answered Jun 25 '19 at 09:38

Jarle Tufto

7,989
1
20
36

I see what you're saying in that these are not structural models; they do not attempt to explain the world in any explicit way. The phrase "makes sense" is also not very precise. Perhaps I could rephrase as: "do non-invertible MA processes exist (in the mathematical sense)?" and "if so, does the data generating process resemble anything found in nature?" What I'm worried about is there's a artificial property, something akin to getting younger as you age, encapsulated by the second equation above. – Ben Ogorek Jun 25 '19 at 12:58
@BenOgorek I think any process in nature that mechanistically involve a moving average could easily correspond to a non-invertible model. A toy example is $y_t=\epsilon_t+3\epsilon_{t-1}+\epsilon_{t-2}$ which has roots `Mod(polyroot(c(1,3,1)))`. – Jarle Tufto Jun 25 '19 at 14:19
Hi Ben: The concept of inverting ( that I'm familiar with ) is to see whether the MA can be written as an equivalent AR($\infty$). In the equation that the OP wrote, if the $y_{t-i}$ are kept and not converted to epsilons, then the equation shows that, for $abs(\theta) >= 1 $, the $y_{t-i} $ further in the past have a greater effect on the current response $y_{t}$, than the closer $y_{t-i}$. In books I have read, they normally say that this type of equation has no meaning and it is basically disregarded. – mlofton Jun 25 '19 at 21:52
Ben: Note that I am not claiming that there's anything wrong with an MA(1) with $abs(\theta) > 1.0$. In practice, as long as one is not interested in the AR equivalent, then the model should not be problematic. It's more of a theory issue, I think. – mlofton Jun 26 '19 at 01:10

Do non-invertible MA models imply that the effect of past observations increases with the distance?

2 Answers2

Not a big deal - it is strongly stationary and approaches white noise

Linked