4

This might be due to my relative inexperience with time series modelling, but I am confused about the correct number of observations to report for an ARIMA/ARIMAX model. I couldn't find any post that directly gets at this (though Number of observations used for ARIMA modeling comes close).

Say I run the following model:

fit1 <- arima(lh, order = c(0,1,0))

And then check the number of “used” observations (wording from the documentation):

fit1$nobs
length(lh)

The number of observations is one less than the total length of the time series, because we difference it once (ARIMA(0,1,0)). Fair enough. But if I then add a lag:

fit2 <- arima(lh, order = c(1,1,0))
fit2$nobs

The number of “used” observations is the same, which is confusing to me, since I would have expected to lose an additional observation in the beginning of the series. How can we have a value for the lag at the first observation? Same thing goes for MA terms:

fit3 <- arima(lh, order = c(0,1,1))
fit3$nobs

How can we have a value for the lag of the error at the first observation? Clearly I’m missing something.

It gets even a little bit more confusing if I want to incorporate transfer functions with the arimax function from the TSA package, since arimax doesn’t return a nobs object nor does it have a nobs method.

I would greatly appreciate some help on this!

Best,

Bertel

Bertel
  • 353
  • 1
  • 2
  • 10

1 Answers1

1

The issue here is examining the # of estimable equations . When you introduce ar structure in the errors this CAN act to reduce the # of estimable equations. Lag structures in predictors have no effect if they are each less than or equal to the model-implied lag of Y . If they exceed the model-implied lag of Y based upon differencing in Y and the ar structure of the error process then the # of estimable equations is appropriately reduced by the differential.

Degrees of freedom = # of estimable equations less the # of parameters estimated

For example if we have NOB observations and have a first difference operator for the error structure we have NOB-1 estimable equations.

If we introduce one lag of X in the model this doesn't change the # of estimable equations. If we introduce a lag of 2 for the X variable this reduces the # of estimable equations to NOB-2

IrishStat
  • 27,906
  • 5
  • 29
  • 55
  • This is incorrect: "observations" count *data,* not model degrees of freedom. – whuber Aug 14 '19 at 12:37
  • this is not a clear sentence , please clarify . I have added more content to my answer . – IrishStat Aug 14 '19 at 12:48
  • +1 Your clarification resolved my misunderstanding of your original answer--thank you. – whuber Aug 14 '19 at 12:53
  • @IrishStat : Thanks for the reply and the explanation! (and sorry for my slow response). So what would you report for N (the number of observations) in, let's say an ARIMA(3,1,1) model estimated on a time series of length 100? or an ARIMA(1,1,3)? Or would you not report N at all? – Bertel Aug 31 '19 at 13:19
  • The sample size is aLways N . The degrees of freedom associated with the error process is based upon the # of estimable relationships (SAY M) minus the # of estimated parameters (SAY J) . For a (3,1,1) model this would be J=4 AND M=N-4 thus DEGREES OF FREEDOM =M-J . ...For a (1,1,3) model this would be J=4 AND M=N-2 AND thus DEGREES OF FREEDOM =M-J . I would present ALL 4 i.e. N,M,J,DEGREES OF FREEDOM – IrishStat Aug 31 '19 at 15:58
  • Ok, @IrishStat, thank you - that makes senses! (you mean "M=N-4" in the last example, right? Since J=4) – Bertel Sep 05 '19 at 11:42
  • .degrees of freedom= n-2-4 since m=n-2 this is degrees of freedom = m-4 – IrishStat Sep 05 '19 at 11:57