3

if my outer cv is 5-fold, after the process, i have 5 final models, then apply these 5 final models from each CV to the whole dataset (training+validation+testing). For my case, the final 5 accuracy are: .63,.95,.92,.63,.95. what does it mean? unstable?overfitting. of course, my sample size is small, 38. what i mean is that if i have new data come in, and i want to apply the final final model to the new data. which one i should choose as the final final model. thanks a lot.

user80518
  • 31
  • 1

1 Answers1

1

(Nested) cross-validation is a way to estimate the performance of a modeling pipeline. In principle, it doesn't result in a final predictive model.

Various approaches exist to obtain a final model, the main ones being:

  1. Train one overall model on the full data set that will be used for predictions and combine that with nested cross-validation estimate of its performance.
  2. Make an ensemble of the models you've constructed in the outer cross-validation, typically through bagging.

You probably want to read the answers to this related question too: Training with the full dataset after cross-validation?

Marc Claesen
  • 17,399
  • 1
  • 49
  • 70