Training MLP by early-stopping without dropout layers

Question

I am training a multi-layer perceptron (MLP) with 4 hidden layers. I got the best hyper-parameters by the following steps using HParams:

Training model by each combination of parameters, such as {'dropout_rate_of_l1': 0.1, 'dropout_rate_of_l2': 0.6, 'dropout_rate_of_l3': 0.3, 'dropout_rate_of_l4': 0}, there are about 3500 different combinations in total;
20% samples in training set were used as validation set in this process;
Training 300 steps for each parameter combination and save the best model that had the lowest error on validation set;
Getting the best parameter combination that has the lowest error over all parameter combinations as the final hyper-parameters.

Finally, I got [0, 0, 0, 0] of dropout rates for 4 dropout layers. Then I used early-stopping with patients equal to 20 to train my model. The details can be found: https://tensorboard.dev/experiment/0kGL4vOuRpamHzALyGna1Q/#hparams

My question is that whether it is reasonable if I trained MLP by early-stopping without any dropout layers.

If you have enough data this is certainly reasonable. It is not clear that early stopping is necessary either. Did you try it without early stopping? — J. Delaney, Feb 19 '22 at 12:49
@J.Delaney I have tried training model without early-stopping, and I can observe the classic "overfitting curve". The error on training set was reducing persistently and the error on validation set decreased at the begining and then started to increase after some steps. — Belter, Feb 19 '22 at 14:09

score 1 · Accepted Answer · answered Feb 19 '22 at 11:43

1

Yes, this can certainly happen. There is no such thing that dropout is always better and necessary. It's another means of regularization.

answered Feb 19 '22 at 11:43

gunes

May I ask whether there are some articles to support this point? – Belter Feb 25 '22 at 09:01
e.g. https://stats.stackexchange.com/questions/299292/dropout-makes-performance-worse/299305 – gunes Feb 25 '22 at 13:48

1 Answers1