What are some of the disavantage of bayesian hyper parameter optimization?

Question

I am fairly new to machine learning and statistics but I was wondering why bayesian optimization is not referred more often online when learning machine learning to optimize your algorithm hyperparameters? For example using a framework like this one: https://github.com/fmfn/BayesianOptimization

Does bayesian optimization of your hyperparameters have any limitation or major disadvantage over techniques like grid search or random search?

[No free lunch in search and optimization](https://en.wikipedia.org/wiki/No_free_lunch_in_search_and_optimization). In general, unless cost function evaluation is rather costly and the dimensionality of the problem somewhat small, BO is usually not the answer. The field of [Mathematical Optimisation](https://en.wikipedia.org/wiki/Mathematical_optimization) did not become obsolete because of the discovery of Gaussian Processes. — usεr11852, Aug 10 '17 at 20:57
@JanKukacka Good point. I've moved my comments to an answer. — Sycorax, Jun 18 '18 at 03:12

Sycorax · Answer 1 · 2019-11-05T17:27:10.087

results are sensitive to parameters of the surrogate model, which are typically fixed at some value; this underestimates uncertainty; or else you have to be fully Bayesian and marginalize over hyper parameter distributions, which can be expensive and unwieldy.
it takes a dozen or so samples to get a good surrogate surface in 2 or 3 dimensions of search space; increasing dimensionality of the search space requires yet more samples
Bayesian optimization itself depends on an optimizer to search the surrogate surface, which has its own costs -- this problem is (hopefully) cheaper to evaluate than the original problem, but it is still a non-convex box-constrained optimization problem (i.e., difficult!)
estimating the BO model itself has costs

To state it another way, BO is an attempt to keep the number of function evaluations to a minimum, and get the most "bang for the buck" from each evaluation. This is important if you're conducting destructive tests, or just doing a simulation that takes an obscene amount of time to execute. But in all but the most expensive cases, apply pure random search and call it a day! (Or LIPO if your problem is amenable to its assumptions.) It can save you a number of headaches, such as optimizing your Bayesian Optimization program.

What are some of the disavantage of bayesian hyper parameter optimization?

1 Answers1

Linked