1-step-ahead predictions with dynlm R package

Question

I've fit a model with several independent variables, one of which is the lag of the dependent variable, using the dynlm package.

Assuming I have 1-step-ahead forecasts for my independent variables, how do I get 1-step-ahead forecasts for my dependent variables?

Here is an example:

library(dynlm)

y<-arima.sim(model=list(ar=c(.9)),n=10) #Create AR(1) dependant variable
A<-rnorm(10) #Create independant variables
B<-rnorm(10)
C<-rnorm(10)
y<-y+.5*A+.2*B-.3*C #Add relationship to independant variables 
data=cbind(y,A,B,C)

#Fit linear model
model<-dynlm(y~A+B+C+L(y,1),data=data)

#Forecast
A<-c(A,rnorm(1)) #Assume we already have 1-step forecasts for A,B,C
B<-c(B,rnorm(1))
C<-c(C,rnorm(1))
y=window(y,end=end(y)+c(1,0),extend=TRUE)
newdata<-cbind(y,A,B,C)
predict(model,newdata)

And here is an example using the dyn package, which works.

library(dyn)

#Fit linear model
model<-dyn$lm(y~A+B+C+lag(y,-1),data=data)

#Forecast
predict(model,newdata)the dyn packages, which works:

Using only `dynlm` package will not provide forecasts for your dependent variables. Providing forecasts for your dependent variables will require a model to explain them and probably additional data. I suggest you to read something about multivariate regression such as "Applied Multivariate Statistical Analysis" by Johnson and Wichern. or a course on forecasting: http://www.duke.edu/~rnau/411home.htm — deps_stats, Jan 31 '11 at 22:32
@deps_stats The dependent variable is what I want to forecast. I'm assuming that I already have forecasts for my independent variables. In my example code, y is the dependent variable I am trying to forecast, and A,B,C are the independent variables, which I already have forecasts for. If you run the example code I posted, you will understand the nature of my problem. — Zach, Jan 31 '11 at 23:31
@Zach: Nice Kaggle rating! (I clicked through the link in your mouse-over profile) — Hugh Perkins, Oct 30 '12 at 04:03

mpiktas · Accepted Answer · 2012-02-17T09:20:06.207

Congratulations, you have found a bug. Prediction for dynlm with new data is broken if lagged variables are used. To see why look at the output of

predict(model)
predict(model,newdata=data)

The results should be the same, but they are not. Without newdata argument, the predict function basically grabs model element from the dynlm output. With newdata argument predict tries to form new model matrix from newdata. Since this involves parsing formula supplied to dynlm and the formula has function L, which is defined only internaly in function dynlm, the incorrect model matrix is formed. If you try to debug, you will see, that the lagged dependent variable is not being lagged in the case of newdata argument is supplied.

What you can do is to lag the dependent variable and include it in the newdata. Here is the code illustrating this approach. I use set.seed so it would be easily reproducible.

library(dynlm)

set.seed(1)
y<-arima.sim(model=list(ar=c(.9)),n=10) #Create AR(1) dependant variable
A<-rnorm(10) #Create independant variables
B<-rnorm(10)
C<-rnorm(10)
y<-y+.5*A+.2*B-.3*C #Add relationship to independant variables 
data=cbind(y,A,B,C)

#Fit linear model
model<-dynlm(y~A+B+C+L(y,1),data=data)

Here is the buggy behaviour:

> predict(model)
       2        3        4        5        6        7        8        9       10 
3.500667 2.411196 2.627915 2.813815 2.468595 1.733852 2.114553 1.423225 1.470738 
> predict(model,newdata=data)
        1         2         3         4         5         6         7         8         9        10 
2.1628335 3.7063579 2.9781417 2.1374301 3.2582376 1.9534558 1.3670995 2.4547626 0.8448223 1.8762437

Form the newdata

#Forecast fix.
A<-c(A,rnorm(1)) #Assume we already have 1-step forecasts for A,B,C
B<-c(B,rnorm(1))
C<-c(C,rnorm(1))

newdata<-ts(cbind(A,B,C),start=start(y),freq=frequency(y))

newdata<-cbind(lag(y,-1),newdata)
colnames(newdata) <- c("y","A","B","C")

Compare forecast with model fit:

> predict(model)
       2        3        4        5        6        7        8        9       10 
3.500667 2.411196 2.627915 2.813815 2.468595 1.733852 2.114553 1.423225 1.470738 
> predict(model,newdata=newdata)
       1        2        3        4        5        6        7        8        9       10       11 
      NA 3.500667 2.411196 2.627915 2.813815 2.468595 1.733852 2.114553 1.423225 1.470738 1.102367

As you can see for historical data the forecast coincides and the last element contains the 1-step ahead forecast.

How can you handle the case where you have two lags in the same formula? `lag(y,-1)+lag(y,-2)`? — Hugh Perkins, Oct 27 '12 at 13:27
Well, then this solution does not work. You need to write your own prediction function. — mpiktas, Oct 29 '12 at 11:38
Did you consider submitting it to dynlm authors? It is a bizarre situation, that you cannot predict using dynlm. — mpiktas, Oct 29 '12 at 14:14
Hmmm, you're saying they're not going to magically monitor stackoverflow and fix bugs? I guess that's probably true! — Hugh Perkins, Oct 29 '12 at 15:49
@HughPerkins can you please share the code for prediction, if we have to use two lags to educate everyone? Thanks! — StatguyUser, Jan 09 '16 at 20:12

score 2 · Answer 2 · answered Jan 10 '16 at 18:59

Following @md-azimul-haque 's request, I dug through my 4 years old source code, and found the following appropriately named function. Not sure if this is what @md-azimul-haque is looking for?

# pass in training data, test data,
# it will step through one by one
# need to give dependent var name, so that it can make this into a timeseries
predictDyn <- function( model, train, test, dependentvarname ) {
    Ntrain <- nrow(train)
    Ntest <- nrow(test)
    # can't rbind ts's apparently, so convert to numeric first
    train[,dependentvarname] <- as.numeric(train[,dependentvarname])
    test[,dependentvarname] <- NA
    testtraindata <- rbind( train, test )
    testtraindata[,dependentvarname] <- ts( as.numeric( testtraindata[,dependentvarname] ) )
    for( i in 1:Ntest ) {
       cat("predicting i",i,"of",Ntest,"\n")
       result <- predict(model,newdata=testtraindata,subset=1:(Ntrain+i-1))
       testtraindata[Ntrain+i,dependentvarname] <- result[Ntrain + i + 1 - start(result)][1]
    }
    testtraindata <- testtraindata[(Ntrain+1):(Ntrain + Ntest),dependentvarname]
    names(testtraindata) <- 1:Ntest
    return( testtraindata )
}

1-step-ahead predictions with dynlm R package

2 Answers2