How to pick the best model with cross validation?

Question

Based on my understanding the leave one out cross validation is to hold a sample out as the test set and fit a model with remaining data and then calculate the error of prediction of the test sample and repeat the procedure for n samples. since i am using partial least square regression, LOOCV is a good approach for determination the number of components but what about the validation? what is the best model out of n model resulted by LOOCV procedure? i read here that cross validation is a "way of estimating the generalization performance of models generated by a particular procedure" that's cool but what i am interested is to finally get a model that i can apply on some other dataset without doing the whole procedure (i.e. PLS) again.

score 3 · Accepted Answer · answered Oct 29 '15 at 18:17

"What i am interested is to finally get a model that i can apply on some other dataset without doing the whole procedure (i.e. PLS) again."

I think this is somewhat misguided; model selection (e.g. choosing the number of components) should be viewed as an integral part of the model fitting procedure, so you should repeat it independently every time you fit a model to a new dataset.

Note that if cross-validation is being used to choose the model, it will give an optimistically biased performance estimate, so it is better to use nested cross-validation, where the outer cross-validation is used for performance estimation and the inner-cross-validation used for model selection, independently in each fold of the outer cross-validation.

How to pick the best model with cross validation?

1 Answers1

Linked