0

I have noticed that SARIMAX model in statsmodels does not produce the expected (correct) fittedvalues when the model is specified as an ARMA. Below is an example showing the discrepancy between what I expected and the value fitted by the SARIMAX model.

Code:

import pandas as pd
import statsmodels.api as sm


def sarimax_model():

index = pd.period_range(start='2000', periods=4, freq='A')
original_observations = pd.Series([1.2, 1.5, 1.0, 0.8], index=index)
mod = sm.tsa.SARIMAX(original_observations, order=(1, 0, 1))
res = mod.fit()

print("Input data:\n", original_observations)
print("Model parameters:\n", res.params, "\n")
print("Model residuals:\n", res.resid, "\n")
print("Fitted values:\n", res.fittedvalues, "\n")

# Expected value for 2001
# val_2001 = 0.948959 * 1.200000 + (-0.044637) * 1.200000
val_2001 = res.arparams*res.data.endog[0] + res.maparams*res.resid[0]

print("Expected fitted values for 2001:", "\n", val_2001, "\n")


if __name__ == '__main__':
    sarimax_model()

Output:

Model parameters:
 ar.L1     0.948959
ma.L1    -0.044637
sigma2    0.121073
dtype: float64 

Model residuals:
 2000    1.200000
2001    0.367058
2002   -0.407083
2003   -0.167130
Freq: A-DEC, dtype: float64 

Fitted values:
 2000    0.000000
2001    1.132942
2002    1.407083
2003    0.967130
Freq: A-DEC, dtype: float64 

Expected fitted values for 2001: 
 [1.08518626]

I wonder if I am missing something here, or the SARIMAX model is simply incorrect. The SARIMAX model produces the correct answer when it is constructed as an AR.

Glad to join this community.

Solo :)

Solo
  • 1
  • 2
  • Thanks for the link @cfulton. At the beginning of the sample the error term is unknown (or the estimate of the error is inaccurate) and the optimal parameters of the model are also unknown. Hence, the predictions at the beginning of the sample are affected by inaccurate estimates of the errors and non-optimal parameters. Why aren't the fitted values recalculated with optimal parameter values? I understand this cannot be done (or meaningless) for forecasting, but for a training sample (historical data) this seems reasonable if the aim is to find the best fitted values. Solo :) – Solo Jul 28 '21 at 07:25
  • For a results object constructed using `fit`, all output is computed using the optimal parameters. For time series models in Statsmodels, `fittedvalues` is defined to be the one-step-ahead predictions, and `resid` is defined to be the one-step-ahead prediction error. The issue here is that `resid` simply does not correspond to the best estimate of the MA error term at the beginning of the sample. This is just by definition and is the nature of MA processes, and there is nothing that can be done about it. It is not because things aren't computed using optimal parameters. – cfulton Jul 28 '21 at 23:39
  • Thank you @cfulton! Solo :) – Solo Jul 29 '21 at 00:11

0 Answers0