if my outer cv is 5-fold, after the process, i have 5 final models, then apply these 5 final models from each CV to the whole dataset (training+validation+testing). For my case, the final 5 accuracy are: .63,.95,.92,.63,.95. what does it mean? unstable?overfitting. of course, my sample size is small, 38. what i mean is that if i have new data come in, and i want to apply the final final model to the new data. which one i should choose as the final final model. thanks a lot.
Asked
Active
Viewed 108 times
1 Answers
1
(Nested) cross-validation is a way to estimate the performance of a modeling pipeline. In principle, it doesn't result in a final predictive model.
Various approaches exist to obtain a final model, the main ones being:
- Train one overall model on the full data set that will be used for predictions and combine that with nested cross-validation estimate of its performance.
- Make an ensemble of the models you've constructed in the outer cross-validation, typically through bagging.
You probably want to read the answers to this related question too: Training with the full dataset after cross-validation?

Marc Claesen
- 17,399
- 1
- 49
- 70