How to cross-validate monthly data using k-fold in a small dataset

Question

I would like to use k-fold on my small dataset (length = 118) and apply it to a random forest model.

However, it is a time series of monthly data. Starting from October 2010 up to Jul 2020.

What is the best way of cross-validating my data in this case?

Here's the head of my data where the Date is the index column:

          Per.Change Domestic.Production.from.UKCS Import Per.GDP.Growth Average.Temperature Price.Electricity Price.Gas
2010-10-01       2.08                          3.54   5.40            0.2               10.44             43.50     46.00
2010-11-01      -3.04                          3.46   6.74           -0.1                5.52             46.40     49.66
2010-12-01       0.31                          3.54   9.00           -0.9                0.63             58.03     62.26
2011-01-01       2.65                          3.59   7.58            0.6                4.05             48.43     55.98
2011-02-01       1.52                          3.20   5.68            0.4                6.29             46.47     53.74
2011-03-01      -1.38                          3.40   5.93            0.5                6.59             51.41     60.39

score 0 · Answer 1 · answered Dec 07 '20 at 15:02

You'll use time series cross validation which respects the time dimension. This question has very good answers with visualisations. Basically, you'll do something like

Fold 1: Training: [2010, 2011, 2012], Test: [2013]

Fold 2: Training: [2010, 2011, 2012, 2013], Test: [2014]

...

This way, your validation respects the time ordering, and there'll be no data-leakage.

How to cross-validate monthly data using k-fold in a small dataset

1 Answers1