Evaluating results of VAR (Vector Autoregression) using R

Question

I am trying to evaluate the results of a prediction obtained with the R function VAR. I have reproduced an example with two time series so that others can also implement it (the data set is read from a url). After seeing the code, I have several questions:

When I use ndiff, I find different integration for the two variables (0 and 1), this causes a problem when using the function VARselect. For this reason, I have decreased the window size. Is this a correct approach? (see flag 1)
After applying several statistical test,I only consider cases with no autocorrelation AND when a granger test of causality is passed. How can I prove that predicting with two time series is better than one time series? Should I compare with AR? How can I show this? Could someone tell me how to plot the comparison? (see flag 2)
How does the predict function work? In my case, the granger test is passed for "views granger cause unemployment." When I use predict, are the values of the "views" used in the prediction of the unemployment (in this case 5 months) similarly to when you use exogenous regressors? Should I do this? How?

Should I consider any other test for VAR? At this time, I am only using VAR, later on I will consider other models. Thanks.

rm(list=ls())
library(vars)
library(forecast)
library(lubridate)
library(fUnitRoots)
library(urca)
library(vars)
library(aod)
library(zoo)
library(tseries)

paired.ds <- read.table(url("http://ruthygarcia.com/others/test.txt"), header=T, row.names = NULL, stringsAsFactors = FALSE)

### Check for  heteroscedasticity based on http://stats.stackexchange.com/questions/6330/when-to-log-transform-a-time-series-before-fitting-an-arima-model

if ( gqtest(paired.ds$unemployment ~1)$p.value < 0.1) {

    paired.ds$unemployment <- log(paired.ds$unemployment)
  }

  if ( gqtest(paired.ds$views ~1)$p.value < 0.1) {

    paired.ds$views <- log(paired.ds$views)
  }


  ##### Create time series

  unemployment.ts<-ts(paired.ds$unemployment,freq=12, start=c(year(paired.ds$date[1]), month(paired.ds$date[1])) ) 

  views.ts<-ts(paired.ds$views,freq=12, start=c(year(paired.ds$date[1]),month(paired.ds$date[1])))


  ##### First model

  model.ts <- cbind(unemployment.ts, views.ts) 

  #### Seasonality 
  ## Test first for seasonality
  s.unemployment <- unemployment.ts
  s.views <- views.ts

  #### Fix seasonality if nsdiffs > 0
  ns.unemp <- nsdiffs(unemployment.ts)
  if(ns.unemp > 0) {
    print(sprintf("Found seasonality in unemployment of %s", ns.unemp))
    s.unemployment <- diff(s.unemployment,lag=frequency(s.unemployment),differences=ns.unemp)
  } 

  ####  Fix seasonality if nsdiffs > 0
  ns.views <- nsdiffs(views.ts)
  if(ns.views > 0) {
    print(sprintf("Found seasonality in viewst of %s", ns.views))
    s.views <- diff(s.views,lag=frequency(s.views),differences=ns.views)
  } 



  #### Integration
  lag.unemployment  =ndiffs(s.unemployment, alpha = 0.05, test = c("adf"))
  d.unemployment = s.unemployment

  ####  Integraate of ndiffs > 0

  if (lag.unemployment >0){
    print(sprintf("Found integration in unemployment of %s", lag.unemployment))
    d.unemployment = diff(d.unemployment, lag=lag.unemployment)
  }


  ####  Integraate of ndiffs > 0
  lag.views = ndiffs(s.views, alpha = 0.05, test = c("adf"))
  d.views = s.views

  if (lag.views >0){
    print(sprintf("Found integration in views of %s", lag.views))
    d.views = diff(d.views, lag=lag.views)
  }



  #Model 2
  #### Make it stationary

  model2.ts <- cbind(d.unemployment, d.views)
  max.lag <- max(lag.unemployment, lag.views) #### see which one has bigger lag, in this particular example it is lag.views = 1




  #### Here we divide data in training and testing and fix the time window. Since lag.views =1, the first element in model2.ts is NA's, so we start the 
  ##### by the next date that is no NA, which is Feb 2008
  ### ***flag 1****
  model3.ts <- window(model2.ts, 
                      start=c(year(as.Date(as.yearmon(time(model2.ts))[1+ max.lag]) ), month(as.Date(as.yearmon(time(model2.ts))[1+ max.lag]) )),
                      end = c( tail(year(as.Date(as.yearmon(time(model2.ts))) ), 6)[1], tail(month(as.Date(as.yearmon(time(model2.ts))) ), 6)[1])
  )
  testdata <- window(model2.ts, 
                     start=c( tail(year(as.Date(as.yearmon(time(model2.ts))) ), 5)[1], tail(month(as.Date(as.yearmon(time(model2.ts))) ), 5)[1])
  )


  ###### The VARselect() enables the user to determine an optimal lag length according to an information criteria or the final
  ###### prediction error of an empirical VAR(p) process

  pos.lags <- VARselect(model3.ts, lag.max = 10)$selection
  ## result
  ##AIC(n)  HQ(n)  SC(n) FPE(n) 
  ##3      3      2      3 

  ## choose AIC
  var = VAR(model3.ts, p=pos.lags[1], type="const")
  p.value <- serial.test(var, lags.pt=10, type="PT.asymptotic")
  roots(var) # stable model has all roots <1
  # result of roots
  # 0.8767457 0.8767457 0.6394563 0.6394563 0.5066303 0.5066303

  # serial.test It is tested for autocorrelation in errors using a portmanteau test. The null hypothesis of no autocorrelation is rejected when the pp-value < 0.05
  ### Since autocorrelation is an undesirable feature of the model, we want to look for another model that does not have autocorrelation. We want
  ### a p value such that the null of no autocorrelation cannot be rejected because the pp-value > 0.05
  if(p.value$serial$p.value > 0.05) { 
    ### Test for Granger causality 
    ### It is is a statistical concept of causality that is based on prediction. According to Granger causality, 
    ### if a signal X1 "Granger-causes" (or "G-causes") a signal X2, then past values of X1 should contain information that helps predict 
    #### X2 above and beyond the information contained in past values of X2 alone



    ### does unemployment causes views? ---> NO

    result <- grangertest(model3.ts[,2] ~ model3.ts[,1], order=pos.lags[1])
    if(result$`Pr(>F)`[2] < 0.05){

      print ("**********P-VALUE VALUE LESS THAN 0.05 WE CAN SAY THAT WE CAN REJECT NON CAUSALITY, WE MAY HAVE CAUSALITY *************")

      #### predicting next 5 months, *****  here is my doubt! How is the predict calculated? 
      ##### how to know two time series in better than one?
      p<- predict(var, n.ahead=5, ci=0.95) 
      fcst = forecast(var, h = 5)
      print( accuracy( fcst, testdata))

    }

    ### does views causes unemployment? ---> YES
    result <- grangertest(model3.ts[,1] ~ model3.ts[,2], order=pos.lags[1])
    if(result$`Pr(>F)`[2] < 0.05){

      print ("**********P-VALUE VALUE LESS THAN 0.05 WE CAN SAY THAT WE CAN REJECT NON CAUSALITY, WE MAY HAVE CAUSALITY *************")


      #### predicting next 5 months, *****  here is my doubt! How is the predict calculated? 
      ##### how to know two time series in better than one?
      ##### **** flag 2 *****
      p<- predict(var, n.ahead=5, ci=0.95) 
      fcst = forecast(var, h = 5)
      print( accuracy(fcst, testdata))
      plot(var)
      plot(p)
    }

  }
  ######## enf od stackoverflow

Evaluating results of VAR (Vector Autoregression) using R

0 Answers0