Suppose I have decided to evaluate the following model selection procedure (let's call it PROC(1) )
START PROCEDURE:
For alpha in [0,1,2...1000]: Get the K-fold cross validation error of M(alpha) (the model parametrized by the hyperparameter alpha)
Pick alpha* such that M(alpha*) had the best cross validation error. Fit M(alpha*) on the full data.
END PROCEDURE
We know the cross-validation error of M(alpha*) is going to be biased upwards compared to the true generalization error of PROC(1) because PROC(1) was used on the entire data.
Thus, we need to use nested cross validation to get an estimate of the generalization error of PROC(1). Let us supposed we apply cross validation to PROC(1) and determine that it is not a good model selection procedure. We could try another procedure, call it PROC(2) (perhaps expanding or narrowing the grid of hyperparameters, increasing or decreasing the family of models) and estimate its generalization error.
The problem is that if we select the best procedure from {PROC(1),PROC(2)...PROC(M)}, the nested cross validation applied to the select procedure, PROC(J) will no longer be unbiased. Thus it would seem like we would need to apply nested nested cross validation but this would cause another problem (which subset of PROCEDURES to use?)
And it seems unrealistic to hope to get the best PROC on the "first try," so to speak. So what are the strategies for dealing with this issue?