Reading this, Cross-validation including training, validation, and testing. Why do we need three subsets?
I realized that if we can reduce the variance of the model performance, I wouldn't need the test set.
And we can get reduced variance of the model performance by merging validation and test set to get the model performance.
So I wonder if it's true that if model selection is not so competing enough to allow the probability of picking one best model by chance, would it be better to just merge the validation and test set, use something like bootstrapping or CV to get estimates of variance of model performance?
But then again, when the models are competing each other very hard, if we merge the validation and test set together, we would be better at picking the best model.
So I think if we report the variance of model performance, It would be better to just merge the test and validation data and pick the best model and report the variance of the model performance.
Is this conclusion right? or are there some holes?