Note:
There is another post with very similar title, but it is really on the author's specific problem with submarines. Mine is more general.
I have asked a similar question before but turns out it wasn't a good question, so I am asking a modified one. (a practice recommended by the meta site I think)
Question starts:
I have time series $X_t$ and $Y_t$, $t=0,...,N$. I want to develop a model to use $X_t$ to predict $Y_t$. This is not a forecasting question, because in the remote future I won't observe $Y_t$. So I won't be using the history of $Y_t$ to predict its future.
So I thought the easiest approach is to use linear regression lm(Y~x)
, possibly with lagged covariates, except the residuals $\epsilon_t$ are very clearly auto-correlated. So i thought I could try Rob Hyndman's OLS with ARIMA residuals. But then this is where I got stuck. Say you have trained your model now and the residuals follow an AR(1). When you end up using it to predict at the remote future $t=N+M$, $M$ large, you can easily get the fixed part $X_{N+M}\hat{\beta}$, but
- How do you get $\hat{\epsilon}_{N+M}$? Do you set it to be 0? Or do you do the crazy thing of forecasting it $M$ timestep from $t=N$?
- Say you set it to 0, what about $\hat{\epsilon}_{N+M+1}$? Do you start using your AR(1) developed with a 0 for $\hat{\epsilon}_{N+M}$?
- On that note, what if the residuals are ARIMA? Do you set the first difference to be 0?
- Is there a better approach? How about
nlme:gls
withcorStruct=corARMA()
(although it seems to be doing the same thing)?
[edited: this is realy a general methodology question. But if you want something more concrete, say both $X_t$ and $Y_t$ are daily data over 4 years. So maybe seasonality is there.]