I clearly have a major misunderstanding of k-folds cross-validation. Suppose that you have some training data, and you use 5-folds cross-validation to train a model with it. Unless I am very much mistaken, you now have five models. So how are you going to decide which model to use in practice? You may have some test data, but picking a model based on that data is rightly considered cheating (the test data should always be truly independent).
Asked
Active
Viewed 131 times
2
-
1Does this answer your question? [How to choose a predictive model after k-fold cross-validation?](https://stats.stackexchange.com/questions/52274/how-to-choose-a-predictive-model-after-k-fold-cross-validation) – Ben Reiniger May 03 '20 at 23:26
1 Answers
1
Cross validation (say k-fold) serves the following purposes in general:
- Tuning your hyper-parameters. In this one, you use the training data, apply cv and decide on the best hyper-parameters based on average validation set performance (using k different models and k different validation sets) and pick the best one. Once chosen, train on the whole training data with chosen hyper-parameters and evaluate on the test data. So, there is one model for the testing.
- Estimating the test performance. Typically done when the data is scarce, and you don't want to separate a single test set with small number of samples. No hyper-parameter search is done here (unless inner CV is applied). Here, you choose a model apply k-fold cv and get a k validation scores and get the average (or predict all the dataset using k-fold and then calculate the cumulative score afterwards, mainly done when scores corresponding to a single validation fold are not preferred, e.g. correlation score when LOOCV is used). This result is a promise/estimate of the success on a test data that you do not have access right now. In this one, you have k models, but it doesn't matter since you don't have a separate test data.

gunes
- 49,700
- 3
- 39
- 75