12

I am fairly new to machine learning and statistics but I was wondering why bayesian optimization is not referred more often online when learning machine learning to optimize your algorithm hyperparameters? For example using a framework like this one: https://github.com/fmfn/BayesianOptimization

Does bayesian optimization of your hyperparameters have any limitation or major disadvantage over techniques like grid search or random search?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
EtienneT
  • 263
  • 2
  • 7
  • 4
    [No free lunch in search and optimization](https://en.wikipedia.org/wiki/No_free_lunch_in_search_and_optimization). In general, unless cost function evaluation is rather costly and the dimensionality of the problem somewhat small, BO is usually not the answer. The field of [Mathematical Optimisation](https://en.wikipedia.org/wiki/Mathematical_optimization) did not become obsolete because of the discovery of Gaussian Processes. – usεr11852 Aug 10 '17 at 20:57
  • 3
    Great answers. But why do you post them in comments? – Jan Kukacka Feb 02 '18 at 10:23
  • @JanKukacka Good point. I've moved my comments to an answer. – Sycorax Jun 18 '18 at 03:12

1 Answers1

14
  1. results are sensitive to parameters of the surrogate model, which are typically fixed at some value; this underestimates uncertainty; or else you have to be fully Bayesian and marginalize over hyper parameter distributions, which can be expensive and unwieldy.
  2. it takes a dozen or so samples to get a good surrogate surface in 2 or 3 dimensions of search space; increasing dimensionality of the search space requires yet more samples
  3. Bayesian optimization itself depends on an optimizer to search the surrogate surface, which has its own costs -- this problem is (hopefully) cheaper to evaluate than the original problem, but it is still a non-convex box-constrained optimization problem (i.e., difficult!)
  4. estimating the BO model itself has costs

To state it another way, BO is an attempt to keep the number of function evaluations to a minimum, and get the most "bang for the buck" from each evaluation. This is important if you're conducting destructive tests, or just doing a simulation that takes an obscene amount of time to execute. But in all but the most expensive cases, apply pure random search and call it a day! (Or LIPO if your problem is amenable to its assumptions.) It can save you a number of headaches, such as optimizing your Bayesian Optimization program.

Sycorax
  • 76,417
  • 20
  • 189
  • 313