13

I've fit a model with several independent variables, one of which is the lag of the dependent variable, using the dynlm package.

Assuming I have 1-step-ahead forecasts for my independent variables, how do I get 1-step-ahead forecasts for my dependent variables?

Here is an example:

library(dynlm)

y<-arima.sim(model=list(ar=c(.9)),n=10) #Create AR(1) dependant variable
A<-rnorm(10) #Create independant variables
B<-rnorm(10)
C<-rnorm(10)
y<-y+.5*A+.2*B-.3*C #Add relationship to independant variables 
data=cbind(y,A,B,C)

#Fit linear model
model<-dynlm(y~A+B+C+L(y,1),data=data)

#Forecast
A<-c(A,rnorm(1)) #Assume we already have 1-step forecasts for A,B,C
B<-c(B,rnorm(1))
C<-c(C,rnorm(1))
y=window(y,end=end(y)+c(1,0),extend=TRUE)
newdata<-cbind(y,A,B,C)
predict(model,newdata)

And here is an example using the dyn package, which works.

library(dyn)

#Fit linear model
model<-dyn$lm(y~A+B+C+lag(y,-1),data=data)

#Forecast
predict(model,newdata)the dyn packages, which works:
Zach
  • 22,308
  • 18
  • 114
  • 158
  • Using only `dynlm` package will not provide forecasts for your dependent variables. Providing forecasts for your dependent variables will require a model to explain them and probably additional data. I suggest you to read something about multivariate regression such as "Applied Multivariate Statistical Analysis" by Johnson and Wichern. or a course on forecasting: http://www.duke.edu/~rnau/411home.htm – deps_stats Jan 31 '11 at 22:32
  • 2
    @deps_stats The dependent variable is what I want to forecast. I'm assuming that I already have forecasts for my independent variables. In my example code, y is the dependent variable I am trying to forecast, and A,B,C are the independent variables, which I already have forecasts for. If you run the example code I posted, you will understand the nature of my problem. – Zach Jan 31 '11 at 23:31
  • @Zach: Nice Kaggle rating! (I clicked through the link in your mouse-over profile) – Hugh Perkins Oct 30 '12 at 04:03

2 Answers2

13

Congratulations, you have found a bug. Prediction for dynlm with new data is broken if lagged variables are used. To see why look at the output of

predict(model)
predict(model,newdata=data)

The results should be the same, but they are not. Without newdata argument, the predict function basically grabs model element from the dynlm output. With newdata argument predict tries to form new model matrix from newdata. Since this involves parsing formula supplied to dynlm and the formula has function L, which is defined only internaly in function dynlm, the incorrect model matrix is formed. If you try to debug, you will see, that the lagged dependent variable is not being lagged in the case of newdata argument is supplied.

What you can do is to lag the dependent variable and include it in the newdata. Here is the code illustrating this approach. I use set.seed so it would be easily reproducible.

library(dynlm)

set.seed(1)
y<-arima.sim(model=list(ar=c(.9)),n=10) #Create AR(1) dependant variable
A<-rnorm(10) #Create independant variables
B<-rnorm(10)
C<-rnorm(10)
y<-y+.5*A+.2*B-.3*C #Add relationship to independant variables 
data=cbind(y,A,B,C)

#Fit linear model
model<-dynlm(y~A+B+C+L(y,1),data=data)

Here is the buggy behaviour:

> predict(model)
       2        3        4        5        6        7        8        9       10 
3.500667 2.411196 2.627915 2.813815 2.468595 1.733852 2.114553 1.423225 1.470738 
> predict(model,newdata=data)
        1         2         3         4         5         6         7         8         9        10 
2.1628335 3.7063579 2.9781417 2.1374301 3.2582376 1.9534558 1.3670995 2.4547626 0.8448223 1.8762437 

Form the newdata

#Forecast fix.
A<-c(A,rnorm(1)) #Assume we already have 1-step forecasts for A,B,C
B<-c(B,rnorm(1))
C<-c(C,rnorm(1))

newdata<-ts(cbind(A,B,C),start=start(y),freq=frequency(y))

newdata<-cbind(lag(y,-1),newdata)
colnames(newdata) <- c("y","A","B","C")

Compare forecast with model fit:

> predict(model)
       2        3        4        5        6        7        8        9       10 
3.500667 2.411196 2.627915 2.813815 2.468595 1.733852 2.114553 1.423225 1.470738 
> predict(model,newdata=newdata)
       1        2        3        4        5        6        7        8        9       10       11 
      NA 3.500667 2.411196 2.627915 2.813815 2.468595 1.733852 2.114553 1.423225 1.470738 1.102367 

As you can see for historical data the forecast coincides and the last element contains the 1-step ahead forecast.

mpiktas
  • 33,140
  • 5
  • 82
  • 138
2

Following @md-azimul-haque 's request, I dug through my 4 years old source code, and found the following appropriately named function. Not sure if this is what @md-azimul-haque is looking for?

# pass in training data, test data,
# it will step through one by one
# need to give dependent var name, so that it can make this into a timeseries
predictDyn <- function( model, train, test, dependentvarname ) {
    Ntrain <- nrow(train)
    Ntest <- nrow(test)
    # can't rbind ts's apparently, so convert to numeric first
    train[,dependentvarname] <- as.numeric(train[,dependentvarname])
    test[,dependentvarname] <- NA
    testtraindata <- rbind( train, test )
    testtraindata[,dependentvarname] <- ts( as.numeric( testtraindata[,dependentvarname] ) )
    for( i in 1:Ntest ) {
       cat("predicting i",i,"of",Ntest,"\n")
       result <- predict(model,newdata=testtraindata,subset=1:(Ntrain+i-1))
       testtraindata[Ntrain+i,dependentvarname] <- result[Ntrain + i + 1 - start(result)][1]
    }
    testtraindata <- testtraindata[(Ntrain+1):(Ntrain + Ntest),dependentvarname]
    names(testtraindata) <- 1:Ntest
    return( testtraindata )
}
Hugh Perkins
  • 4,279
  • 1
  • 23
  • 38