1

I'm trying to get the prediction of ma process using model parameters. MA(1) process is: $$X_t = \Theta \omega_{t-1} + \omega_t$$ Hence the prediction might behave like$$X_t = \Theta \omega_{t-1}$$

But it seems that Statsmodels has some "initialization" that affect the head of the prediction.

For example, I create here MA(1) process and print the difference between my prediction using the model parameters and Statsmodels prediction:

y = [1,2,0,1,0.5,1.5,1,1,1.3,2,0.7]
model = sm.tsa.SARIMAX(y,order=(0,0,1)).fit(disp=False)
prediction = model.predict()
residuals = model.resid
params = model.params
print([(params[0]*residuals[i]-prediction[1+i]) for i in range(len(y)-1)])

And in the results you can see the non-zero difference (that became smaller in time).

> [0.057898976628312615,
 0.015479432460304365,
 -0.0010168740694590506,
 0.00032489606054031395,
 -6.4951881716558e-07,
 1.0618887241187203e-05,
 4.5199844286858415e-07,
 1.6323989904254432e-07,
 3.073650017837437e-08,
 8.574563170604677e-09]

Can anyone explain what model Statsmodels is using to get this prediction?

Note that same phenomenon exist also in R, see here.

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
Itzik
  • 13
  • 4

1 Answers1

1

The key here is that the error process $\omega_t$ is not known and so you must estimate it using a statistical model. In SARIMAX and in R, this estimation is basically done via Bayesian updating - you begin with a guess (your prior) and then you use a new observation to update your guess (now the posterior).

Notice that at any time $t$, you will never have information about $\omega_{t+1}$, so your prior is the unconditional distribution, $\omega_{t+1} \sim N(0, \sigma^2)$. What does change is how much information you have about $\omega_t$.

Before observing any information at all, the best you can do about $\omega_0$ is to again set your prior according to the unconditional distribution. Given this, your prior about $X_1 = \Theta \omega_0 + \omega_1$ is $E(X_1) = 0$ and $Var(X_1) = (1 + \Theta^2) \sigma^2$.

Once you observe $X_1$, you update your estimate of $\omega_1$ via Bayesian updating. Because you now have some information, this is a better estimate than the unconditional estimate, but you still have some uncertainty about $\omega_1$.

As you observe more data $X_2, X_3, ...$, your time $t$ uncertainty about $\omega_t$ continues to fall as you learn about the error process by comparing predictions to observations. Eventually, this conditional uncertainty becomes negligible (i.e eventually $Var(\omega_t \mid X_t, X_{t-1}, \dots) \approx 0$), and at this point you can be essentially certain that you know the value of $\omega_t$ at time $t$.

Under this condition, your prior expectation for $X_{t+1}$ becomes $E[X_{t+1} \mid X_t, X_{t-1}, \dots] \approx \Theta \omega_t$, and this is what you expected in your question. The main point is that this is not the appropriate prior unless you are certain about $\omega_t$, and that can't happen at the beginning of the sample when you have no information about it.

cfulton
  • 1,193
  • 1
  • 6
  • 10
  • Thanks. Can you cite some reference of the algorithm of this process? It seems that the effect became more significant where d>1 and the conversion to the naive process is very slow. – Itzik Oct 24 '19 at 13:28
  • This algorithm is called the Kalman filter. I'm not sure what `d` is. – cfulton Oct 26 '19 at 15:00