Many people suggest fine-tuning a network using Bayesian optimization ( or grid search or what every other black box optimization method you like ) so I tried it for my self. I am not sure about the following things:
- How long should I run the network at each iteration of the Bayesian optimization? - I chose to run it about a 10th the number of epochs I would till the network was fully trained.
- what should be the term which I optimize? I chose the minimum validation loss during the short training. Should I maybe fit my loss to some exponential decay function and try to estimate the loss at the end of training given that it remains a smooth learning curve.
- how many iterations should I run the Bayesian optimization given that I have about 15 hyperparameters that I am trying to tune ( most of which are continuous over a small range).
Any other advice would be much appreciated as well, Thanks, Dan