2

I have $n$ instances in my data and I will do 5-fold cross validation on it (like in the picture): enter image description here

But when I read about "repeated cross-validation" I think that it will give me exactly the same answers because it's the same data, the same folds and everything.

Am I right?

Silverfish
  • 20,678
  • 23
  • 92
  • 180
floyd
  • 1,240
  • 13
  • 24

2 Answers2

2

But when I read about "repeated cross-validation"Ii think that it will give me exactly THE SAME answers because it's the same data, the same folds and everything. Am I right?

No. Between each repetition you virtually shuffle your data, so you get different data in different folds each repetition.

Firebug
  • 15,262
  • 5
  • 60
  • 127
2

@firebug is right that you shuffle your data, so folds have different composition.

However, if the models are stable (i.e. the same model parameters result irrespecitve of the small changes in the training data due to inluding/excluding a few cases), you still get the same prediction for the same test case. If, on the other hand, your models are sensitive to the small changes in the training data between the different folds (i.e. unstable), then you'll have different predictions across the runs for the same test case.

Repeated cross validation allows you to measure this aspect of model stability.

cbeleites unhappy with SX
  • 34,156
  • 3
  • 67
  • 133
  • but i think that this the same job of ordinary cross validation, the ordinary CV check for stability, right? – floyd Jun 12 '16 at 15:41
  • No, ordinary (= one run, each case is tested exactly once) CV cannot distinguish between model stability and variance due to the variance in the test cases. So you cannot distinguish between variance of the predictions due to instability vs. due to using a different test set. You can however, check the stability of the model (parameters) - depending on the data and model, that is more or less easy and does make more or less sense. With repeated CV you can directly measure the effect of model stability on the predictions. – cbeleites unhappy with SX Jun 14 '16 at 07:40
  • why can ordinary CV not distinguish between model stability and variance due to the variance in the test cases? – floyd Jun 21 '16 at 03:29
  • 1
    because samples are nested in the surrogate models: each case is only evaluated by exactly one surrogate model, so you don't know who causes variance, the model or the case. – cbeleites unhappy with SX Jun 24 '16 at 19:51
  • @cbeleites I will like to understand why repeated cross validation measures model stability. When we repeat cross validation our training and test set still remains random. So how does this translate in a way that "we are keeping the test case the same and training on different data" to allow us to measure model stability? I have asked this question here and it will be wonderful if you can help. Thanks! https://stats.stackexchange.com/questions/551242/how-does-repeated-k-fold-cross-validation-identify-model-instability – woowz Nov 06 '21 at 21:06