I am new to time series prediction and forecasting with neural networks and am having trouble with cross validation.
I am fitting a multivariate time series. I have 236 monthly observations. I am using the caret package for this, seeing as Rob Hyndman has suggested to use rolling forecast cross validation for time series here: https://www.otexts.org/fpp/2/5.
Caret has the function to do so but, for starters, I am having some trouble understanding the ins and outs of the trainControl documentation.
What does the initialWindow mean in layman's terms? From the documentation on time slices it says the initial number of consecutive values in each training set sample.
I figured, using Rob Hyndman's approach, that I would use 235 of my total 236 observations, the last observation being the "test set", thus setting initialWindow to 235.
Here is the code I used:
control <- trainControl(method = "timeslice",
initialWindow = 235,
fixedWindow = TRUE,
horizon = 1)
mynn <- train(mytsframe4[,c(2:3)], mytsframe4[,1],
method = "mlp",
size = 2,
metric = c("RMSE"),
maximize = FALSE,
trControl = control)
mynn
Multi-Layer Perceptron
236 samples
2 predictor
No pre-processing
Resampling: Rolling Forecasting Origin Resampling (1 held-out with a fixed window)
Summary of sample sizes: 225, 225, 225, 225, 225, 225, ...
Resampling results across tuning parameters:
size RMSE Rsquared RMSE SD
1 0.05837386693 NaN 0.04002651320
3 0.05759843218 NaN 0.04774038998
5 0.07597407274 NaN 0.03000920417
The data has been normalized, no missing values, from Feb 1995 to Sept 2014.
In summary, here are my questions I've already typed and a few more:
What does the initialWindow parameter mean in layman's terms?
What does the fixedWindow mean in layman's terms?
How is the output of the model interpreted? More specifically, what does the "size" mean?
Why could be causing NaNs in the Rsquared column?
How do I obtain the outputs/predictions so I can create unscaled forecasts?