I am trying to evaluate the results of a prediction obtained with the R function VAR. I have reproduced an example with two time series so that others can also implement it (the data set is read from a url). After seeing the code, I have several questions:
- When I use ndiff, I find different integration for the two variables (0 and 1), this causes a problem when using the function VARselect. For this reason, I have decreased the window size. Is this a correct approach? (see flag 1)
- After applying several statistical test,I only consider cases with no autocorrelation AND when a granger test of causality is passed. How can I prove that predicting with two time series is better than one time series? Should I compare with AR? How can I show this? Could someone tell me how to plot the comparison? (see flag 2)
- How does the predict function work? In my case, the granger test is passed for "views granger cause unemployment." When I use predict, are the values of the "views" used in the prediction of the unemployment (in this case 5 months) similarly to when you use exogenous regressors? Should I do this? How?
Should I consider any other test for VAR? At this time, I am only using VAR, later on I will consider other models. Thanks.
rm(list=ls()) library(vars) library(forecast) library(lubridate) library(fUnitRoots) library(urca) library(vars) library(aod) library(zoo) library(tseries) paired.ds <- read.table(url("http://ruthygarcia.com/others/test.txt"), header=T, row.names = NULL, stringsAsFactors = FALSE) ### Check for heteroscedasticity based on http://stats.stackexchange.com/questions/6330/when-to-log-transform-a-time-series-before-fitting-an-arima-model if ( gqtest(paired.ds$unemployment ~1)$p.value < 0.1) { paired.ds$unemployment <- log(paired.ds$unemployment) } if ( gqtest(paired.ds$views ~1)$p.value < 0.1) { paired.ds$views <- log(paired.ds$views) } ##### Create time series unemployment.ts<-ts(paired.ds$unemployment,freq=12, start=c(year(paired.ds$date[1]), month(paired.ds$date[1])) ) views.ts<-ts(paired.ds$views,freq=12, start=c(year(paired.ds$date[1]),month(paired.ds$date[1]))) ##### First model model.ts <- cbind(unemployment.ts, views.ts) #### Seasonality ## Test first for seasonality s.unemployment <- unemployment.ts s.views <- views.ts #### Fix seasonality if nsdiffs > 0 ns.unemp <- nsdiffs(unemployment.ts) if(ns.unemp > 0) { print(sprintf("Found seasonality in unemployment of %s", ns.unemp)) s.unemployment <- diff(s.unemployment,lag=frequency(s.unemployment),differences=ns.unemp) } #### Fix seasonality if nsdiffs > 0 ns.views <- nsdiffs(views.ts) if(ns.views > 0) { print(sprintf("Found seasonality in viewst of %s", ns.views)) s.views <- diff(s.views,lag=frequency(s.views),differences=ns.views) } #### Integration lag.unemployment =ndiffs(s.unemployment, alpha = 0.05, test = c("adf")) d.unemployment = s.unemployment #### Integraate of ndiffs > 0 if (lag.unemployment >0){ print(sprintf("Found integration in unemployment of %s", lag.unemployment)) d.unemployment = diff(d.unemployment, lag=lag.unemployment) } #### Integraate of ndiffs > 0 lag.views = ndiffs(s.views, alpha = 0.05, test = c("adf")) d.views = s.views if (lag.views >0){ print(sprintf("Found integration in views of %s", lag.views)) d.views = diff(d.views, lag=lag.views) } #Model 2 #### Make it stationary model2.ts <- cbind(d.unemployment, d.views) max.lag <- max(lag.unemployment, lag.views) #### see which one has bigger lag, in this particular example it is lag.views = 1 #### Here we divide data in training and testing and fix the time window. Since lag.views =1, the first element in model2.ts is NA's, so we start the ##### by the next date that is no NA, which is Feb 2008 ### ***flag 1**** model3.ts <- window(model2.ts, start=c(year(as.Date(as.yearmon(time(model2.ts))[1+ max.lag]) ), month(as.Date(as.yearmon(time(model2.ts))[1+ max.lag]) )), end = c( tail(year(as.Date(as.yearmon(time(model2.ts))) ), 6)[1], tail(month(as.Date(as.yearmon(time(model2.ts))) ), 6)[1]) ) testdata <- window(model2.ts, start=c( tail(year(as.Date(as.yearmon(time(model2.ts))) ), 5)[1], tail(month(as.Date(as.yearmon(time(model2.ts))) ), 5)[1]) ) ###### The VARselect() enables the user to determine an optimal lag length according to an information criteria or the final ###### prediction error of an empirical VAR(p) process pos.lags <- VARselect(model3.ts, lag.max = 10)$selection ## result ##AIC(n) HQ(n) SC(n) FPE(n) ##3 3 2 3 ## choose AIC var = VAR(model3.ts, p=pos.lags[1], type="const") p.value <- serial.test(var, lags.pt=10, type="PT.asymptotic") roots(var) # stable model has all roots <1 # result of roots # 0.8767457 0.8767457 0.6394563 0.6394563 0.5066303 0.5066303 # serial.test It is tested for autocorrelation in errors using a portmanteau test. The null hypothesis of no autocorrelation is rejected when the pp-value < 0.05 ### Since autocorrelation is an undesirable feature of the model, we want to look for another model that does not have autocorrelation. We want ### a p value such that the null of no autocorrelation cannot be rejected because the pp-value > 0.05 if(p.value$serial$p.value > 0.05) { ### Test for Granger causality ### It is is a statistical concept of causality that is based on prediction. According to Granger causality, ### if a signal X1 "Granger-causes" (or "G-causes") a signal X2, then past values of X1 should contain information that helps predict #### X2 above and beyond the information contained in past values of X2 alone ### does unemployment causes views? ---> NO result <- grangertest(model3.ts[,2] ~ model3.ts[,1], order=pos.lags[1]) if(result$`Pr(>F)`[2] < 0.05){ print ("**********P-VALUE VALUE LESS THAN 0.05 WE CAN SAY THAT WE CAN REJECT NON CAUSALITY, WE MAY HAVE CAUSALITY *************") #### predicting next 5 months, ***** here is my doubt! How is the predict calculated? ##### how to know two time series in better than one? p<- predict(var, n.ahead=5, ci=0.95) fcst = forecast(var, h = 5) print( accuracy( fcst, testdata)) } ### does views causes unemployment? ---> YES result <- grangertest(model3.ts[,1] ~ model3.ts[,2], order=pos.lags[1]) if(result$`Pr(>F)`[2] < 0.05){ print ("**********P-VALUE VALUE LESS THAN 0.05 WE CAN SAY THAT WE CAN REJECT NON CAUSALITY, WE MAY HAVE CAUSALITY *************") #### predicting next 5 months, ***** here is my doubt! How is the predict calculated? ##### how to know two time series in better than one? ##### **** flag 2 ***** p<- predict(var, n.ahead=5, ci=0.95) fcst = forecast(var, h = 5) print( accuracy(fcst, testdata)) plot(var) plot(p) } } ######## enf od stackoverflow