Do differencing within ARIMA or do differencing first before fitting ARIMA

Question

There is a similar question about differencing within ARIMA or before fitting ARIMA. It is stated from the answers that there are differences when differencing is done before fitting ARIMA from differencing within ARIMA.

My question would be:

What is more recommended, differencing within ARIMA or before fitting ARIMA?or
What are instances wherein it is better to difference within ARIMA than to difference before fitting ARIMA, and vice-versa?
Are the differences very significant when you do one over the other?

Assuming you're trying to generate a stationary series, you always difference before you decide on the model. Then, you check if the model seems more stationary by differencing. Then, when you FIT the model, you can difference the series and call the arima(p,0,q) function or use the not differenced series it and call the arima(p.d.q) function where d is the order of differencing. They have to give the same result but, in order to get the same result, you need to "undifference" the series if you use the first type of call. or, you can difference the resulting series from the second call. — mlofton, Jun 11 '20 at 03:14
Could you link the answer that says there is a difference between the two methods? — Richard Hardy, Jun 11 '20 at 05:40
https://stats.stackexchange.com/questions/32634/difference-time-series-before-arima-or-within-arima — Marco Alexis, Jun 11 '20 at 05:53
the answer of sir Rob. https://stats.stackexchange.com/a/32799/288065 — Marco Alexis, Jun 11 '20 at 05:58
Marco: I should have said that, THEORETICALLY, there should be no difference. What Rob is saying that R uses a different algorithm which causes differences. So, if you use the arima model in R, it sounds like you should let R deal with the differencing. But that's not to say that, if you differenced first, and the algorithm didn't change and you called arima with d set to zero, then it wouldn't be possible to back out the undifferenced model so that it is equivalent to the one that was called with d set to the order of differencing. — mlofton, Jun 11 '20 at 20:19
Note that there other issues with calling the arima function in base R. I don't know if it's still around but Dr. Stoffer from the University of Pittsburgh had a site on his web devoted to criticism of the arima function in base R. You should probably check it out before you continue fitting arima models in R. Note that there are many R packages that have their own arima functions so they may have have dealt with the deficiencies that Dr. Stoffer describes. — mlofton, Jun 11 '20 at 20:21
Here's the link that I was referring to. I don't know how much of it still applies but it's worth checking out. https://www.stat.pitt.edu/stoffer/tsa2/Rissues.htm Note that It used to be easier to read. It's an old link and think he changed the style of his website sine and, for me. it's almost too difficult to bother reading. — mlofton, Jun 12 '20 at 03:51
@mlofton, see also https://www.stat.pitt.edu/stoffer/tsa4/Rissues.htm — Richard Hardy, Jun 12 '20 at 05:53
Thank you very much. I am actually using python. So theoretically, I can use either way. — Marco Alexis, Jun 12 '20 at 06:50
Thanks Richard. I think it's the same link in content but yours is easier to see. — mlofton, Jun 12 '20 at 16:31

Henry · Answer 1 · 2020-06-11T21:08:57.527

If you have a process $y_t$ that you assumed can be modelled with an $ARIMA(p,d,q)$, then it makes no difference if you first transform your data using $Ly_t=y_{t-1}$, $(1-L)y_t=\delta y_t$ and then fit an ARMA model or to use some routine to directly fit the appropriate $ARIMA(p,1,q)$ model.

The only difference from a computational perspective is that if your estimation procedure makes use of some statistical tools to determine automatically the order of integration d (the number of times to difference your data until it is stationary), then you can encounter following scenarios:

You have differenced your data although the automatic routine would not choose to do so, in this case you have made the right call if you reject the proposed order of integration by the software. Otherwise you overdifferenced and will lose information as the relationship in levels can be exploited.
You have not differenced the data enough such that fitting the ARMA instead of ARIMA will make a difference because automatic order determination will difference one more time. In this case, certain asymptotic results break down since your model may be non-stationary. Model estimation will still be consistent though.
You difference exactly the number of times suggested by the software, in this case there is absolutely no difference for ordinary software packages, as the order of integration is not estimated together with the AR and MA components but usually determined by some sort of unit root test before the actual estimation happens.

Concluding, the pros and cons of differencing before or letting e.g. an R function do it automatically depend on your confidence that either you manually or the routine can make a better judgement given the data you try to model.

@Henry: Marco pointed to a link where Rob Hyndman explained that the algorithm used by the arima function is slightly different when $d = 0$ versus when $d = 1 $. So, technically speaking, they won't be the same. But, theoretically speaking, I agree with you. There's no difference conceptually. — mlofton, Jun 12 '20 at 16:34
I think Rob Hyndman was talking about a particular function, but of course depending on your function ARMA and ARIMA have different default settings, treatment of constant etc. — Henry, Jun 12 '20 at 16:40
"Model estimation will still be consistent though"---would you have a reference for this? If Fitting AR model to an ARI series would consistently estimate the AR coefficients (for the AR polynomial with unit root). Not so clear this still holds when, for example, fitting MA model to an IMA series. — Michael, Jun 14 '20 at 03:47

Do differencing within ARIMA or do differencing first before fitting ARIMA

1 Answers1