What guarantees the existence of a finite representation of the Wold decomposition? Mechanics and Intuition

Question

Every covariance stationary process can be written as a linear, infinite distributed lag of white noise. In other words, every covariance stationary process has a Wold representation. Then we go on to say that this infinite distributed lag of white noise can always be approximated by the ratio of 2 finite-order lag polynomials. In other words for every Wold representation (infinite) there is an approximation (finite). It is difficult to overestimate the importance of the existence of this approximation, as without it there would be no ARMA modelling, which is the core of linear time series modelling, and yet every single textbook I've seen only mentions the existence of such an approximation in one sentence as if it were a self-evident fact.

(1) Why is it the case that the infinite Wold representation can always be approximated by the ratio of two finite order polynomials? What guarantees the existence of such an approximation? (2) How good is this approximation? Is the approximation better in some cases than in other?

do you want a nice proof of it or are you asking more about the practicality of the result ? Herman Bierns has the nicest proof of the Wold Decomposition that I have seen. If you google for it, I think it should come up. If not, let me know and I can look. As far as the practical part, every AR(1), can be written as an infinite MA, so that may be connected to the answer. Great question. — mlofton, Dec 03 '18 at 17:20
@mlofton: Thank you for the Bierns reference. I found it but it is far too complex for me... I do not yet understand "sub-Hilbert spaces". I also want to point out that my primary interest is not so much in the Wold representation, which is a beautiful result, but of no practical consequence because we cannot estimate an infinite number of parameters, but rather on the approximation of this infinite Wold representation by a ratio of finite order lag polynomials, which is of enormous practical consequence because we can estimate the parameters of these finite polynomials, hence ARMA. — ColorStatistics, Dec 03 '18 at 17:45
I apologize for getting his name wrong: It's Bierens but you found it anyway. Now I understand your interest better. I forget the name but then check out a paper by Jorgensen. Hold on, I'll try to find it. The idea is that an AR(1) is an infinite MA so that are not as finite as they look. I found it. This is the paper that I think might help you. https://www.econometricsociety.org/publications/econometrica/1966/01/01/rational-distributed-lag-functions. If you have jstor ( I use JPASS, it's pretty reasonable ), you can get it. If you can't, I have it. — mlofton, Dec 04 '18 at 08:06
Keep in mind that I haven't read that paper in a long time so I can't guarantee it will help. But note that a certain specific case of a distributed lag model ( the koyck distributed lag model ) is a specific ARIMAX model. So, they're all kind of related ( ARIMA, ARIMAX, distributed lags etc ) but I don't recall if the paper addresses your question explicitly. Still, it's worth checking out. Sometimes you never know from where the light will enter. – mlofton 6 secs ago edit — mlofton, Dec 04 '18 at 08:18
@mlofton: Thank you very much for the Jorgensen reference. I obtained a copy of it. It appears to address my question spot on. I'll read it carefully and circle back. — ColorStatistics, Dec 04 '18 at 13:23
@ColorStatistics.I truly wasn't sure about it's ulitlity but I'm glad to hear that it sounds helpful. All the best. — mlofton, Dec 05 '18 at 16:07
@mlofton and others: The article above (Jorgenson, 1966) is insightful as it tackles how a distributed lag function can be approximated by a finite distributed lag function, and by a rational distributed lag function, showing that the latter is a more parsimonious approximation. Interestingly, the article predates the development of ARMA, but one can imagine that the infinite order distributed lag function we're looking to approximate is the Wold representation. Still hoping someone who understands this well, can present it in a digestible way, in the context of the Wold representation. — ColorStatistics, Dec 07 '18 at 13:00
Color Statistics. Thanks for summary. This group-list is great but, if it doesn't get answered, you may want to send to economics.stackexchange.com. there are some really talented people over there also. — mlofton, Dec 08 '18 at 08:08
ColorStatitics: I read your question again. I don't know if this helps but an AR(1) can be written as an infinite MA. So, that may be a reason why the approximation of the ration of two finite lag polynomials can be reasonable but I don't know a proof. There might be something in one of Bieren's books regarding this but I don't have them at my fingertips. They're somewhere but I have no idea where.. — mlofton, Dec 08 '18 at 08:12
Wikipedia says *The general ARMA model was described in the 1951 thesis of Peter Whittle*, so it is not predated by a paper from 1966. — Richard Hardy, Sep 20 '21 at 15:19

Ben · Accepted Answer · 2022-01-07T01:17:26.207

Actually, without some further assumptions on the form of the transfer function in the Wold representation, I don't think it is actually true that it can always be well approximated by a ratio of finite-order polynomials. There are classes of time-series models for covariance-stationary processes where this approximation is not considered adequate --- e.g., when dealing with some "long memory" processes.

Analysis via the spectral density: To gain some insight into this aspect of time-series analysis, it is useful to look at the spectral density of a covariance-stationary process. This is fairly natural, since it allows us to see the process in frequency-space. Consider a covariance-stationary process $\{ X_t | t \in \mathbb{Z} \}$, meaning that its first two moments have the form:

$$\mathbb{E}(X_t) = \mu \quad \quad \quad \mathbb{Cov}(X_{t+r}, X_{t}) = \gamma(r),$$

where $\gamma$ is called the transfer function. If the process has an absolutely continuous spectral density then we can write this as:

$$f(\delta) = \frac{1}{2 \pi} \sum_{r \in \mathbb{Z}} \gamma(r) e^{2 \pi i r \delta}.$$

This function is periodic, and we can examine it over the Nyquist range $\tfrac{1}{2} \leqslant \delta \leqslant \tfrac{1}{2}$, which gives a full period. Now, under the standard ARMA representation $\phi(B) X_t = \theta(B) \varepsilon_t$ with $\sigma^2 = \mathbb{V}(\varepsilon_t)$ (which leads to a ratio of two finite polynomials for the $\text{MA}(\infty)$ representation) we get the spectral density:

$$f(\delta) = \frac{\sigma^2}{2 \pi} \bigg| \frac{\theta(e^{2 \pi i r \delta})}{\phi(e^{2 \pi i r \delta})} \bigg|^2.$$

In particular, at the zero frequency we get:

$$f(0) = \frac{\sigma^2}{2 \pi} \bigg| \frac{\theta(1)}{\phi(1)} \bigg|^2.$$

For many covariance-stationary processes, this form approximates the true spectral density fairly well. However, certain kinds of covariance-stationary time series are not well approximated by this form. Particular things that affect this are the rate of decay of the transfer function in the tails (e.g., exponential decay, power-law decay, etc.) and whether the time-series process is "short memory" or "long memory".

One specific case where the ARMA representation is not a particuarly good approximation is when the time-series process has "long memory". This phenomenon is defined by the spectral property that $f(0)=\infty$, which means that the transfer function has the divergent sum $\sum_{r \in \mathbb{Z}} \gamma(r) = \infty$. This property cannot be achieved within the standard ARMA form, since $|\theta(1)| \leqslant \sum_i |\theta_i| < \infty$ and $|\phi(1)|>0$.

Why is it the case that the infinite Wold representation can always be approximated by the ratio of two finite order polynomials?

Unless you impose some accuracy requirements or convergence conditions on the approximation, anything can be approximated by anything. So the question really becomes, under what conditions can we approximate the Wold representation with an ARMA model and still get good approximation properties (e.g., convergence, arbitrary accuracy with a finite order model, etc.)? I will address this in your subsequent questions.

What guarantees the existence of such an approximation?

Certain general forms for the transfer function in the Wold representation can be represented as power series that can be approximated up to arbitrary accuracy by a finite rational function (i.e., a ratio of two finite polynomials). This is a broad topic in real/complex analysis, and I recommend you go back to basics and have a look at the general topic of Taylor series representations of functions, and the classes of holomorphic/analytic functions. You will see that there are certain nasty classes of functions (e.g., periodic functions) that are not well-approximated by a polynomial, and other functions that are not well-approximated by a ratio of finite polynomials.

As previously noted, without some further assumptions on the form of the transfer function, I don't think it is actually true that it can always be well-approximated by a ratio of finite-order polynomials. There are some kinds of covariance-stationary time-series processes where the transfer function is "nasty" and is not well-approximated by the ARMA form. A specific case of this is "long memory" processes.

The other answer here notes that any meromorphic function can be approximated well by a finite rational function (i.e., a ratio of two finite polynomials). This is true, but it just pushes the question back one step: under what conditions will the transfer function in the Wold representation give rise to a meromorphic power series?

How good is this approximation? Is the approximation better in some cases than in other?

Approximation by an ARMA model is certainly better in some cases than others. ARMA models can approximate most "short memory" processes quite well, but they are not great at approximating "long memory" processes. The more general question, how good is the approximation, is large enough to fill entire books --- the answer will depend on the nature of the transfer function in the Wold representation, and how you measure "goodness" of an approximation.

This is a terrific answer, Ben, and I say that because it gives me the needed guidance to understand what is involved here and what I need to understand along the way before I can appreciate the complexities around this question. As I learn these stepping stone concepts, I will check back here to better appreciate what you said and to ask clarifying questions. Thank you. — ColorStatistics, Sep 23 '21 at 21:14
Glad you found it useful. It was an excellent question, deserving of a helpful answer. As you obviously apprectiate, this is a big topic, so it is one of those things where you will need to go and read some outside stuff and mull it over. Even just the general topic of Taylor series and polynomial approximations is huge in its own right, so that should keep you busy! — Ben, Sep 23 '21 at 21:19

score 1 · Answer 2 · answered Feb 16 '20 at 08:30

The Wold decomposition itself is a trivial fact. It is just the Gram-Schmidt orthogonalization procedure. In the time series context, the Hilbert space in question is the space of random variables with finite second moments.
Just to state the Wold decomposition: For any covariance stationary time series $\{X_t\}$, there exists innovations $\{\epsilon_t\}$ such that $\{X_t\}$ is a two-sided MA$(\infty)$ process with respect to $\{\epsilon_t\}$. In usual heuristic notation, $$ X_t = f(B)\epsilon_t $$ where $f(z) = \sum_{h \in \mathbb{Z}} \gamma_h z^h$.

The series converges in couple senses:

First, it converges in the $L^2$-norm uniformly in $t$. In other words, $\{X_t\}$ can be approximated, in the $L^2$-norm uniformly in $t$, by corresponding truncated finite order MA sum.

Second, for any given $t$, it converges almost surely. In other words, for any given $t$, corresponding truncated MA sum converges to $X_t$ with probability $1$.
Consider the Laurent series $f(z) = \sum_{h \in \mathbb{Z}} \gamma_h z^h$, which defines a meromorphic function on some open annulus in the complex plane.
Any meromorphic function $f(z)$ can be approximately (uniformly on compact sets) by rational function $\frac{\Theta(z)}{\Phi(z)}$. In the time series context, this means $$ X_t = f(B)\epsilon_t $$ can be approximated, in some sense, by the ARMA process $$ X'_t = \frac{\Theta(B)}{\Phi(B)}\epsilon_t. $$

Couple Caveats

First, exactly how "uniformly on compact sets" translates to approximation of random variables is not clear. It is part of standard hand-waving folklore. To make this more precise, one needs to know how "uniformly on compact sets" means in terms of series coefficients. Second, in the non-causal case, replace $z$ (resp. $B$) by $\frac{1}{z}$ (resp. $B^{-1}$, the forward shift).

Thank you, Michael. Your explanation goes beyond my current knowledge. I'll let others comment on its correctness. As far as intuition, did you want to add some to your answer? — ColorStatistics, Mar 26 '20 at 14:01
Thank you for this answer, Michael. Your answer could get an award in complexity. It might be correct, who knows, but I find it altogether undigestible and lacking in intuition. — ColorStatistics, Sep 20 '21 at 14:23

What guarantees the existence of a finite representation of the Wold decomposition? Mechanics and Intuition

2 Answers2