Multilayer perceptron: Hyperparameters vs Parameters and Cross validation (nested or not)

Question

I'm a bit confused about the k-fold cross validation (inner and external) done for the model performance evaluation.

I've read that when you are trying to validate your model, you need to do it without leaking info from the test set to the training set. Data leaking is common when you are doing feature selection on the whole dataset instead of the training set alone (the same goes for the feature standardization) for example.

But how do you prevent that when you are "learning" your best hyperparameters for your model? For example the number of layers of your neural net which performs better on your problem?

I need to "inner" cross validate them in my cross validation task? Or is it irrelevant to select them by the external cross validation? For example by iterating the parameters on 10-fold cross validation alone, and after that picking the best performing ones?

Essentially, need I to validate my process of hyperparameters selection to avoid some kind of bias? I know pretty well that the parameters search (e.g. weights of your neural net) needs to be done in your training set alone, but is it the same for your hyperparameters?

Alexey Burnakov · Accepted Answer · 2017-12-06T16:41:54.147

I know pretty well that the parameters search (e.g. weights of your neural net) needs to be done in your training set alone, but is it the same for your hyperparameters?

Yes. Models with many hyperparameters are more prone to overfit even if you select the hyperparameters inside the inner CV (that's using train data only). Given that you have chosen the CV design perfectly, you still have a chance of selecting the hyperparameter set that is overfit to your validation folds. A good example is random forest or GBM, where you don't optimize weights but solely make a choice of the complexity of the model and its learning rate, which are hyperparameters.

You need outer crossvalidation in order to estimate how badly your inner-CV overfitted your model. A canonical k-fold CV is known to fail to meet the assumptions of error independence thus creating optimistically biased error (loss-function value).

However good the CV is, the more hyperparameters you search through the more chances that your model will behave as though you used the whole dataset for training. Then the outer-CV may come in handy.

Reading that may help get it: Is cross-validation enough to prevent overfitting?

Multilayer perceptron: Hyperparameters vs Parameters and Cross validation (nested or not)

1 Answers1