4

Is it necessary for the observations of the data set to be IID in order to use cross-validation on it? If so, why ? Could you explain in the context of a classification using decision tree.

gunes
  • 49,700
  • 3
  • 39
  • 75
learner
  • 537
  • 2
  • 8

1 Answers1

2

Yes, the typical cross validation assumes iid samples, so that it can freely split the data into training and validation. In case of dependency, such as the temporal dependency in time series datasets, modifications respecting this dependency should be done for the splitting of data. Otherwise, there will be data leakage. See the following for an example of validation in non iid case: time series cross validation.

gunes
  • 49,700
  • 3
  • 39
  • 75