How Statsmodels predict ma process

Question

I'm trying to get the prediction of ma process using model parameters. MA(1) process is: $$X_t = \Theta \omega_{t-1} + \omega_t$$ Hence the prediction might behave like$$X_t = \Theta \omega_{t-1}$$

But it seems that Statsmodels has some "initialization" that affect the head of the prediction.

For example, I create here MA(1) process and print the difference between my prediction using the model parameters and Statsmodels prediction:

y = [1,2,0,1,0.5,1.5,1,1,1.3,2,0.7]
model = sm.tsa.SARIMAX(y,order=(0,0,1)).fit(disp=False)
prediction = model.predict()
residuals = model.resid
params = model.params
print([(params[0]*residuals[i]-prediction[1+i]) for i in range(len(y)-1)])

And in the results you can see the non-zero difference (that became smaller in time).

> [0.057898976628312615,
 0.015479432460304365,
 -0.0010168740694590506,
 0.00032489606054031395,
 -6.4951881716558e-07,
 1.0618887241187203e-05,
 4.5199844286858415e-07,
 1.6323989904254432e-07,
 3.073650017837437e-08,
 8.574563170604677e-09]

Can anyone explain what model Statsmodels is using to get this prediction?

Note that same phenomenon exist also in R, see here.

What are `r[i]` and `p[1+i]` in your code? – Richard Hardy Oct 06 '19 at 08:00 — Richard Hardy, Oct 06 '19 at 08:00
Thanks. I edited the code and added the results. – Itzik Oct 06 '19 at 08:40 — Itzik, Oct 06 '19 at 08:40

score 1 · Accepted Answer · answered Oct 07 '19 at 01:03

The key here is that the error process $\omega_t$ is not known and so you must estimate it using a statistical model. In SARIMAX and in R, this estimation is basically done via Bayesian updating - you begin with a guess (your prior) and then you use a new observation to update your guess (now the posterior).

Notice that at any time $t$, you will never have information about $\omega_{t+1}$, so your prior is the unconditional distribution, $\omega_{t+1} \sim N(0, \sigma^2)$. What does change is how much information you have about $\omega_t$.

Before observing any information at all, the best you can do about $\omega_0$ is to again set your prior according to the unconditional distribution. Given this, your prior about $X_1 = \Theta \omega_0 + \omega_1$ is $E(X_1) = 0$ and $Var(X_1) = (1 + \Theta^2) \sigma^2$.

Once you observe $X_1$, you update your estimate of $\omega_1$ via Bayesian updating. Because you now have some information, this is a better estimate than the unconditional estimate, but you still have some uncertainty about $\omega_1$.

As you observe more data $X_2, X_3, ...$, your time $t$ uncertainty about $\omega_t$ continues to fall as you learn about the error process by comparing predictions to observations. Eventually, this conditional uncertainty becomes negligible (i.e eventually $Var(\omega_t \mid X_t, X_{t-1}, \dots) \approx 0$), and at this point you can be essentially certain that you know the value of $\omega_t$ at time $t$.

Under this condition, your prior expectation for $X_{t+1}$ becomes $E[X_{t+1} \mid X_t, X_{t-1}, \dots] \approx \Theta \omega_t$, and this is what you expected in your question. The main point is that this is not the appropriate prior unless you are certain about $\omega_t$, and that can't happen at the beginning of the sample when you have no information about it.

Thanks. Can you cite some reference of the algorithm of this process? It seems that the effect became more significant where d>1 and the conversion to the naive process is very slow. — Itzik, Oct 24 '19 at 13:28
This algorithm is called the Kalman filter. I'm not sure what `d` is. — cfulton, Oct 26 '19 at 15:00

How Statsmodels predict ma process

1 Answers1

Linked

Related