How to decide model parameters of a neural network effectively

Question

When choosing neural network parameters say numbers of features, layers and neurons, is the best way to do this by training each of the options several times by cross-validation and then take the average of the performance (RMSE) on test data?

If there is a large range of things to test this could become very time consuming training so many neural networks - is there another way that can be used without such time demands? Could you do the exact same method but without cross-validation and it still be reliable?

score 1 · Answer 1 · answered Oct 09 '21 at 21:38

1

If you are tuning computationally intensive algorithms like neural networks, you wouldn't usually use $k$-fold cross-validation, because the computations would take too long. Instead, you would use held-out validation and test sets, so you would train the algorithm only once and validate only on a single test dataset. In fact, this is what Andrew Ng recommends in his course.
To find the best parameters you usually shouldn't use grid search, but random search, or even better, bayesian optimization. Those methods would enable you to search for the hyperparameters more efficiently.

answered Oct 09 '21 at 21:38

Tim

108,699
20
212
390

So for a selection of choices for a particular hyperparameter I train neural networks and choose the one that performs best in validation data. When it comes to choosing the option for the next hyperparameter, do I repeat this approach using the same validation data? After all hyperparameters have been chosen I evaluate the finished model on a completely separated test set? Is that right? – Gabi23 Oct 09 '21 at 22:40
@Gabi23 you don’t choose them one by one, instead you try many different combinations of hyperparameters. In most cases hyperparameters are correlated and you can’t change one without affecting others. So yes. – Tim Oct 10 '21 at 06:09

score 0 · Answer 2 · answered Oct 09 '21 at 21:35

I have not seen a way to get the same reliability as the approach you described with less training time.

However, I may offer some very informal thoughts about this. It would be best to get a second opinion on those, as this is just my experience, but here it goes:

Be less carefull when moving forward, be more carefull when moving backward. Your goal is to get the NN to work well, and you will try many ideas, most of which will not work. When the idea does not work, it does not work, usually it makes things worse. So just use a simple train/test split and don't do cross validation. Move fast, try as many things as you can.

When eventually you do get to a better performance, then it is time to see which ideas did matter and which didn't. There go backwards by removing some of the ideas (called ablation), and do cross-validation (if needed) to be more precise.

The neural network seems to work very well but the difference between the performance on different numbers of features, layers etc were very small. — Gabi23, Oct 09 '21 at 22:45
Thanks. In that case I may have to run about 500 neural networks. Should I do less epochs to speed them up? — Gabi23, Oct 09 '21 at 22:57

How to decide model parameters of a neural network effectively

2 Answers2