Difference time series before Arima or within Arima

Question

Is it better to difference a series (assuming it needs it) before using an Arima OR better to use the d parameter within Arima?

I was surprised how different the fitted values are depending on which route is taken with the same model and data. Or am I doing something incorrectly?

install.packages("forecast")
library(forecast)

wineindT<-window(wineind, start=c(1987,1), end=c(1994,8))
wineindT_diff <-diff(wineindT)

#coefficients and other measures are similar
modA<-Arima(wineindT,order=c(1,1,0))
summary(modA)
modB<-Arima(wineindT_diff,order=c(1,0,0))
summary(modB)

#fitted values from modA
A<-forecast.Arima(modA,1)$fitted

#fitted from modB, setting initial value to the first value in the original series
B<-diffinv(forecast.Arima(modB,1)$fitted,xi=wineindT[1])


plot(A, col="red")
lines(B, col="blue")

ADD:

Please note I am differencing the series once and fitting arima (1,0,0) then I am fitting arima (1,1,0) to the original series. I am (I think) reversing the differencing on the fitted values for the arima(1,0,0) on the differenced file.

I am comparing the fitted values - not the predictions.

Here is the plot (red is arima(1,1,0) and blue is the arima (1,0,0) on the differenced series after changing back to the original scale) :

enter image description here

Response to Dr. Hyndman's Answer:

1) Can you illustrate in R code what I would need to do in order to get the two fitted values (and presumably forecasts) to match (allowing for small difference due to your first point in your answer) between Arima (1,1,0) and Arima(1,0,0) on the manually differenced series? I assume this has to do with the mean not being included in modA, but I am not entirely sure how to proceed.

2) Regarding your #3. I know I am missing the obvious, but are not $\hat{X}_t = X_{t-1} + \phi(X_{t-1}-X_{t-2}) $ and $\hat{Y}_t = \phi (X_{t-1}-X_{t-2})$ the same when $\hat{Y}_t$ is defined as $\hat{X}_t - X_{t-1}$? Are you saying I am "undifferencing" incorrectly?

Regarding your update. 1) I can see no point in doing this. Arima() will produce the fitted values and forecasts. Why should I produce additional R code to do the same thing as Arima() already does? 2) Yes, but differencing X-hat does not give you Y-hat. So undifferencing Y-hat does not give you X-hat. — Rob Hyndman, Jul 24 '12 at 01:05
Thanks. 1) Was a learning exercise for me. 2) My error in the calculation in my original question (using diffinv) was in using the fitted values and not the original is what I think I am getting from this.(?)...which leads to #1 of how to proper undifference the data. I know Arima will do it, just trying to follow a book example using the equations. — B_Miner, Jul 24 '12 at 01:18

score 20 · Answer 1 · answered Jul 23 '12 at 00:07

There are several issues here.

If you difference first, then Arima() will fit a model to the differenced data. If you let Arima() do the differencing as part of the estimation procedure, it will use a diffuse prior for the initialization. This is explained in the help file for arima(). So the results will be different due to the different ways the initial observation is handled. I don't think it makes much difference in terms of the quality of the estimation. However, it is much easier to let Arima() handle the differencing if you want forecasts or fitted values on the original (undifferenced) data.
Apart from differences in estimation, your two models are not equivalent because modB includes a constant while modA does not. By default, Arima() includes a constant when $d=0$ and no constant when $d>0$. You can over-ride these defaults with the include.mean argument.
Fitted values for the original data are not equivalent to the undifferenced fitted values on the differenced data. To see this, note that the fitted values on the original data are given by $$\hat{X}_t = X_{t-1} + \phi(X_{t-1}-X_{t-2})$$ whereas the fitted values on the differenced data are given by $$\hat{Y}_t = \phi (X_{t-1}-X_{t-2})$$ where $\{X_t\}$ is the original time series and $\{Y_t\}$ is the differenced series. Thus $$\hat{X}_t - \hat{X}_{t-1} \ne \hat{Y}_t.$$

+1, I was going to give as an answer the 2 point. Kudos for including the other 2. — mpiktas, Jul 23 '12 at 03:42
Dr. Hyndman, thank you for the response! I have ALOT to learn about time series analysis. Can I ask a follow-up? I am not sure I exactly know what to do with this information so I am posting an add to my original question. — B_Miner, Jul 23 '12 at 12:30

IrishStat · Answer 2 · 2012-07-23T21:26:29.477

Sometimes you need to remove local means to make the series stationary. If the original series has an acf that doesn't die out this can be due to a level/step shift in the series. The remedy is to de-mean the series.

RESPONSE TO BOUNTY:

The way to get the same results/fitted values is after physically differencing the oroiginal (Y(t) series to get first difference (dely) , estimate an AR(1) without a constant.This is tantamount to fitting an OLS model of the form dely(t)= B1*dely(t-1) + a(t) WITHOUT an intercept.The fitted values from this model, suitably integrated of order 1 will ( I believe ) give you the fitted values of a model; [1-B][AR(1)]Y(t)=a(t). Most pieces of software , with the noted exception of AUTOBOX will NOT ALLOW you to estimate an AR(1) model without a constant. Here is the equation for dely =+ [(1- .675B* 1)]**-1 [A(T)] while the equation for Y was

[(1-B*1)]Y(T) =+ [(1- .676B* 1)]**-1 [A(T)] . Note the rounding error caused by the physical differencing of Y. Note that when differencing is in effect (in the model ) OR not the user can select whether or not to include or to exclude the constant. Normal process is to include a constant for a stationary (i.e. undifferenced) ARIMA model and to optionally include a constant when differencing is in the model. It appears that the alternative approach (Arima) forces a constant into a stationary model which in my opinion has caused your dilemma.

Should that impact the fitted values in this case between the arima (1,0,0) on delta-y and arima(1,1,0) on y? — B_Miner, Jul 19 '12 at 19:08
In both cases you are fitting an AR(1) to the first difference of the time series right? If that is the case and the methods of fit are the same they should be doing exactly the same thing. There isn't even a difference in the order of operations. — Michael R. Chernick, Jul 19 '12 at 19:27
Doesnt seem to be the case here. Perhaps @Rob_Hyndman will check in. — B_Miner, Jul 19 '12 at 20:54

score 1 · Answer 3 · answered Jul 19 '12 at 18:02

I don't know why there would be a difference in the results unless somehow you are differencing more times one way than the other. for an ARIMA(p,d,q) the d differences are done first before any model fitting. Then the stationary ARMA(p,q) model is fit to the differenced series. The assumption is that after the removal of polynomial trends in the series the remaining series is stationary. The number of differences corresponds to the order of teh polynomial that you want to remove. So for a linear trend you just take one difference, for a quadratic trend you take two differences. I don't agree with most of what was said in John's answer.

score 0 · Answer 4 · answered Jul 19 '12 at 17:51

0

One reason to difference an I(1) series is to make it stationary. Presuming you have the correct specification for the ARIMA model, the residuals to the model will have the autoregressive and moving average components removed and should be stationary. In that respect it can make sense to use the residuals to the model, rather than differencing. However, in circumstances where you have a lot of data that you think is approximately I(1), some people will just difference the data rather than estimate the ARIMA model wholly. The ARIMA model can fit a whole host of time series problems where it may not make sense to difference. For instance, if the data experiences mean-reversion, this may not always be appropriate to difference since it may not be I(1).

answered Jul 19 '12 at 17:51

John

2,117
16
24

Would you expect the differences to be this large? It made me think I was doing something incorrectly in how I was reverting from differences to the original. – B_Miner Jul 19 '12 at 18:05
Could you explain exactly what you did? I am not good at reading R code. If you take the same number of differences both ways and fit the same ARMA model after differencing you should get the same results as long as the fitting techniques are the same (usually conditional least squares is used). – Michael R. Chernick Jul 19 '12 at 18:16
He takes some data, fits an ARIMA(1,1,0), then takes the differences and fits an ARIMA(1,0,0). Finally, he compares the one period out of sample forecasts to each other. Presumably they are different, but we can't see the graphs in the post. – John Jul 19 '12 at 18:42
A simple example of why what I'm saying makes sense is if you consider $y_{t}=\beta y_{t-1}+\epsilon_{t}$. Taking differences gives $\triangle y_{t}=\left(\beta-1\right)y_{t-1}+\epsilon_{t}$. However, if you want to find $\epsilon_{t}$, you wouldn't get the same answer if you estimate $\triangle y_{t}=\beta\triangle y_{t-1}+\epsilon_{t}$ – John Jul 19 '12 at 18:46
Michael I added to the question. – B_Miner Jul 19 '12 at 18:59
@John you have it wrong $\Delta y_t$ is $y_t-y_{t-1}=\beta(y_{t-1}-y_{t-2})+e_t-e_{t-1}$. Both methods fit beta by the formula I have given. – Michael R. Chernick Jul 19 '12 at 19:32
1

Finally right. I can't do LaTex in 5 minutes! As best as I can tell the above equation comes out both ways. – Michael R. Chernick Jul 19 '12 at 20:01
@MichaelChernick I write my LaTeX using Lyx and then copy my stuff over here. WAY faster. Anyway, I believe the model you're talking about is an ARIMA(1,1,1). The first one I talk about is ARIMA(1,0,0), even after subtracting the lag it is still ARIMA(1,0,0). The third model I mention is ARIMA(1,1,0). ARIMA(1,0,0) is not the same as ARIMA(1,1,0) and you will get different results. – John Jul 19 '12 at 20:32
@John I was looking at ARIMA(1,1,0) but you are right that the term e$_t$$_-$$_1$ should not appear on the right hand side. You are assuming that he fits an AR(1) first and then takes the difference. That of course would be different from taking the difference and then fitting the AR model. But if the series is non stationary you should do the differencing first. The two formulae you provide are correct and do show the difference between the two approaches. But also since beta is estimated prior to differencing in one case and after in the other the estimated betas may be very different. – Michael R. Chernick Jul 19 '12 at 20:57

Difference time series before Arima or within Arima

4 Answers4

Linked