Stabilizing nested cross-validation

Question

Quite recently I stumbled upon several posts here on nested cross-validation, which showed me how wrong my understanding was of such procedure. Now, trying to put all the pieces together, I still have some doubts.

One of my questions is the following: let us suppose I want to come up with a model for my data+task. I know that the purpose of nested cross-validation is to give an estimate of the generalization performance of the model (whatever it may be) on the data I have. I also know that if in all the inner validation folds I obtain the same model with the (approximately) same hyperparameters, the results are stable and I can say that the estimated generalization performance is somehow reliable. In this case I can re-run the inner validation procedure on the entire dataset and obtain the final model. However, if in the inner validation folds the best models (chosen by an optimization procedure such as a grid search) are different, what I can only say is that the estimated generalization performance that comes out of this nested cross-validation run, provided it is "stable" in the outer loop (negligible differences/low variance), is reliable (according to this answer). In this case I would do exactly the same as above to find the final model (all the models are somehow equal in the performance, please correct me if I am wrong).

However, what if the models are different and the outer loop reports very different generalization performance estimates? I would say, according to this answer and many others, that I need to stabilize the procedure. Is that right? In particular, what I would do is to add some regularization or increase the number of repetitions for each internal k-fold cross-validation (other approaches to stabilize the optimization procedure may be possible, I guess, and this does not convince me though). Is this correct reasoning?

And finally, assuming I chose to increase the number of repetitions for the inner k-fold cross-validations, when I move on to obtaining the final model (running the inner cross-validation procedure) should I use the same number of repeats? I would say so, but I am not sure I am right (also in this answer it is suggested that repetitions are very useful for the outer loop but it does not mention the same for the inner loop under the same aspect, which to me seemed quite counterintuitive).

Just a comment wrt repetitions of the inner loop: IMHO they are as useful as for the outer loop: repetitions allow you to directly measure stability, which is an important criterion in the optimization. E.g. I work with a optimization target function of observed average error + stability, thus driving the optimization towards more stable models. — cbeleites unhappy with SX, May 25 '21 at 20:05
Hi @cbeleitesunhappywithSX! Thanks for your comment, I definitely agree that repetitions are a way to measure instability in the inner loop. However I am still not so sure about how to proceed (=choose the model) when there is instability both in the outer loop and the inner loop. — ddd, May 27 '21 at 09:39

Stabilizing nested cross-validation

0 Answers0