3

I have a OLS model under the timeseries framework. Apart from several independent variables I have one which is a lagged version of the Y variable.

Is LOOCV on this model valid?

It seems to be not ok but is there any justification as in references for (or against) performing LOOCV on this type of model?

amoeba
  • 93,463
  • 28
  • 275
  • 317
  • 1
    In time series order matters so it does not make sense to train on the future and test on the past. Here is a nice blog that speaks on approaches for validation of a time series model. https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/ – Jon Feb 15 '18 at 19:38

3 Answers3

4

It depends on what model you are using. If you can write the model as a (nonlinear) autoregression, then yes, LOOCV is ok. It sounds like your model is of the form

$$ y_t = f(y_{t-1}, x_t, z_t, \varepsilon_t)$$

where $x_t$ and $z_t$ are exogenous variables and $\varepsilon_t$ is a white noise error process. That would fit within the models discussed in Bergmeir, Hyndman & Koo (2018), where we prove that under some conditions CV can work for time series models.

Rob Hyndman
  • 51,928
  • 23
  • 126
  • 178
2

Is LOOCV on this model valid?

Naively? Totally not.

You'd be using data from the future to assess the past. Your performance estimate would be completely wrong.

Firebug
  • 15,262
  • 5
  • 60
  • 127
0

Time series models are statistical models where data is an ordered sequence of values of a variable at equally spaced time intervals. (See: NIST/SEMATECH e-Handbook of Statistical Methods )

LOOVC or k-fold CV are not appropriate for time series as they do not opperate within the constraints of the definition of time series models. In LOOVC or k-fold CV, you would shuffle your data (shuffle past values with future values) and split the data into partitions. Each partition would loose order of sequence and thus lose significance as values occurring over time.

Simply, LOOCV and k-fold CV are appropriate for models that assume data is identically and independently distributed (iid).

An alternative to LOOCV and k-fold CV are time series CV methods. After a quick search, I believe this is post describes the methodology

Also, you can see that depending on the tools you use for modeling, there may already be time series specific CV tools. See the following example:

Note that unlike standard cross-validation methods, successive training sets are supersets of those that come before them.

Jon
  • 2,180
  • 1
  • 11
  • 28