3

I have fit an ARIMAX (1, 1, 0) model to a timeseries dataset consisting of 1 endogenous timeseries ("Y") and 1 exogenous timeseries ("X"). My exogenous timeseries in the model was defined as sm.add_constant(df["X"]). Stationarity and invertibility were enforced in the Statsmodels SARIMAX model.

The output of the model is as seen in the attached image:

Model summary

For the fourth to the last record in the timeseries:

  • the model's predicted (and fitted) value is 6.58713620525664
  • the Y value is 6.5895
  • the X value is 6.6768

For the third to the last record in the timeseries:

  • the model's predicted (and fitted) value is 6.59034839014186
  • the Y value is 6.609
  • the X value is 6.67855

For the record before the last record in the timeseries:

  • the model's predicted (and fitted) value is 6.61892751060232
  • the Y value is 6.5815
  • the X value is 6.6917

For the last (oldest) record in the timeseries:

  • the model's predicted (and last fitted) value is 6.56786815053348
  • the Y value is 6.5805
  • the X value is 6.67075

For the first prediction:

  • the model's predicted value is 6.59319101863394
  • the X value is 6.68705
  • (There is no Y value)

I have tried to recreate the predicted values manually without any success. Can anyone help, please?

Newwone
  • 77
  • 5
  • What are the *second* to last historical values ($X$, $\hat{y}$ and $y$)? – Stephan Kolassa Jul 20 '20 at 13:48
  • @StephanKolassa, I have included the data in the question. Please let me know if you need additional information. Thanks for helping me! – Newwone Jul 20 '20 at 14:10
  • @StephanKolassa I have added the 3rd and 4th to the last data. Thanks – Newwone Jul 20 '20 at 14:22
  • Explain what you have done, show your calculations. The most likely reasons for discrepancy: 1) residual pre sampling and 2) you are not using regression with ARIMA (like statsmodel does), but ARIMAX see details https://robjhyndman.com/hyndsight/arimax/ – Aksakal Jul 20 '20 at 14:24
  • also, if this is a homework, tag it with self-study – Aksakal Jul 20 '20 at 14:26
  • Hi, @Aksakal. It's not homework/self study. The only difference between what I did and what Stephan did is in the error part. For my errors, I did -1.454e-15 + (-0.1777 * error (t-1)). For the first error forecast, I initially set it to the last residual. My steps to calculate the error follow the approach here - https://stats.stackexchange.com/questions/394593/manually-calculate-sarimax-forecast/399103 – Newwone Jul 20 '20 at 14:56
  • @Aksakal, what did you mean by residual presampling? (I am certainly using regression with ARIMA errors. It's what statsmodels do when you supply an exog timeseries) – Newwone Jul 20 '20 at 15:27

2 Answers2

2

First off, I can't recreate your numbers either, but I'll write down what I did - it may still be helpful.


Judging from the documentation, SARIMAX fits a regression with SARIMA errors. This is not what is commonly called a SARIMAX model. Rob Hyndman's blog post refers to R, but it should also be relevant here.

That is, the model should be

$$ y_t=\beta_0+\beta_1x_t+\epsilon_t $$

with $\epsilon_t\sim\text{ARIMA}(1,1,0)$, or

$$ (\epsilon_t-\epsilon_{t-1}) = \phi(\epsilon_{t-1}-\epsilon_{t-2})+\eta_t $$ with innovations $\eta_t\sim N(0,\sigma^2)$.

So to predict $\hat{y}_t$, we feed in the estimates $\hat{\beta}_0$ and $\hat{\beta}_1$, and we separately need to predict $\hat{\epsilon}_t$ based on $\hat{\phi}$ and previous errors based on

$$ \epsilon_t = (1+\phi)\epsilon_{t-1}-\phi\epsilon_{t-2}+\eta_t. $$

For $\epsilon_{t-1}$ and $\epsilon_{t-2}$, we can plug in $\hat{\epsilon}_{t-1}=y_{t-1}-\hat{y}_{t-1}$ and $\hat{\epsilon}_{t-2}=y_{t-2}-\hat{y}_{t-2}$. However, that doesn't seem to work (in R):

> phi <- -0.1777
> epsilon <- (1+phi)*(6.5805-6.56786815053348) - phi*(6.5815-6.61892751060232)
> -1.454e-15 + 0.9949*6.68705 + epsilon
[1] 6.656682
Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • Thanks for trying, Stephan. I appreciate it. – Newwone Jul 20 '20 at 14:58
  • could you please have a look at this issue I am having - https://stats.stackexchange.com/questions/500067/statsmodels-linear-regression-with-arima-errors-model-not-including-the-arima-er – Newwone Dec 09 '20 at 17:11
1

Basically Stephan's answer has it right, except that his code is not computing $\hat \epsilon_{t-1}$ and $\hat \epsilon_{t-2}$ correctly. Conditional on having observed $y_{t-1}$ and $x_{t-1}$, we should have:

$$\hat \epsilon_{t-1} = y_{t-1} - (\beta_0 + \beta_1 x_{t-1}) = 6.5805 - 6.63672917 = -0.05622917$$

Edit: So, to be clear, conditional on knowing $y_{t-1}$ and $x_{t-1}$, we actually know the value of $\epsilon_{t-1}$, not just an estimate, and so we don't need the "hat" over it.

Then, proceeding similarly for $\epsilon_{t-2}$, we have:

$$\epsilon_{t-1} = -0.05622917 \\ \epsilon_{t-2} = -0.07607233$$

And so the prediction for $\epsilon_t$ is:

$$\hat \epsilon_t = (1 + -0.1777) * (-0.05622917) - (-0.1777) * (-0.07607233) = -0.059755299532$$

Finally, we can compute the prediction for $y_t$:

$$\hat y_t = -1.454e^{-15} + 0.9949 * 6.68705 + (-0.059755299532) = 6.593190745467998$$

Which matches the prediction you gave above up to as much precision as we can expect given that you only provided 4-5 decimals for the data and parameters.

cfulton
  • 1,193
  • 1
  • 6
  • 10
  • thank you! Using all the decimals for the data and parameters, I can now match the statsmodels outputs following your guidance. – Newwone Jul 20 '20 at 20:33
  • could you please see if you could help with this issue I am having with Statsmodels prediction? It's not producing results in line with my understanding of the model - https://stats.stackexchange.com/questions/500067/statsmodels-linear-regression-with-arima-errors-model-not-including-the-arima-er – Newwone Dec 09 '20 at 17:11
  • can you please confirm SARIMAX still works in the way you have described? It seems the predictions no longer include the addition of the et. The predictions appear to be just the linear regression component only. See my question here for an example with data - https://stats.stackexchange.com/questions/500067/statsmodels-linear-regression-with-arima-errors-model-not-including-the-arima-er – Newwone Dec 09 '20 at 21:52