Questions tagged [bayesian-optimization]

Bayesian optimization is a family of global optimization methods which use information about previously-computed values of the function to make inference about which function values are plausibly optima. Its applications include computer experiments and hyper-parameter optimization in some machine learning models.

Some variants of Bayesian optimization are especially appealing because they strive to keep the total number of function evaluations to a minimum. This is desirable when function evaluations are very expensive (either because it takes quite a bit of computing power, or because they require physical experiments).

Some methods require no knowledge about the function at all (derivatives, functional form, etc.). This is useful when there is no known function (such as the response surface for a model wrt its hyper-parameters) or the function is so complex that computing derivatives is impractical.

Some key resources about Bayesian optimization include the following papers:

Shahriari et al., "Taking the Human Out of the Loop: A Review of Bayesian Optimization" (2015).

Jones et al. "Efficient Global Optimization of Expensive Black-box functions." (1998).

154 questions
73
votes
6 answers

Optimization when Cost Function Slow to Evaluate

Gradient descent and many other methods are useful for finding local minima in cost functions. They can be efficient when the cost function can be evaluated quickly at each point, whether numerically or analytically. I have what appears to me to…
19
votes
2 answers

Advantages of Particle Swarm Optimization over Bayesian Optimization for hyperparameter tuning?

There's substantial contemporary research on Bayesian Optimization (1) for tuning ML hyperparameters. The driving motivation here is that a minimal number of data points are required to make informed choices about what points are worthwhile to try…
Sycorax
  • 76,417
  • 20
  • 189
  • 313
12
votes
1 answer

What are some of the disavantage of bayesian hyper parameter optimization?

I am fairly new to machine learning and statistics but I was wondering why bayesian optimization is not referred more often online when learning machine learning to optimize your algorithm hyperparameters? For example using a framework like this…
12
votes
2 answers

Ill-conditioned covariance matrix in GP regression for Bayesian optimization

Background and problem I am using Gaussian Processes (GP) for regression and subsequent Bayesian optimization (BO). For regression I use the gpml package for MATLAB with several custom-made modifications, but the problem is general. It is a…
10
votes
1 answer

Bayesian optimization for non-Gaussian noise

A black box function $f: \mathbb{R}^n \rightarrow \mathbb{R}$, which is evaluated pointwise subject to Gaussian noise, i.e., $f(x) + \mathcal{N}(\mu(x),\sigma(x)^2)$, can be minimized using Bayesian optimization where a Gaussian Process is used as a…
Johnb
  • 101
  • 2
10
votes
1 answer

Bayesian optimization or gradient descent?

When and why use Bayesian optimization, instead of gradient descent? Which one is better for which cases?
9
votes
1 answer

What's the difference between Bayesian Optimization (Gaussian Processes) and Simulated Annealing in practice

Both processes seem to be used to estimate the maximum value of an unknown function, and both obviously have different ways of doing so. But in practice is either method essentially interchangeable? Where would I want to use one over the…
canyon289
  • 399
  • 2
  • 9
7
votes
2 answers

Intuitive Understanding of Expected Improvement for Gaussian Process

So I am learning Bayesian Optimization and came across expected improvement. My question is are we searching for the point in the Gaussian Process model whose expected value (determined by mean and confidence) shall be decreased the most if sampled…
7
votes
1 answer

practical implementation detail of Bayesian Optimization

I'm giving Bayesian Optimization a go, following Snoek, Larochelle, and Adams [http://arxiv.org/pdf/1206.2944.pdf], using GPML [http://www.gaussianprocess.org/gpml/code/matlab/doc/]. I've implemented the Expected Improvement acquisition function…
6
votes
2 answers

Question of understanding regarding Bayesian Optimization, Gaussian process and acquisition function

I'm trying to understand Bayesian optimization and I struggle a lot with all the involved methods. Hence, I have some short questions: We start with a a-prior function, which is a gaussian process. A gaussian process is something like a normal…
Ben
  • 2,032
  • 3
  • 14
  • 23
6
votes
1 answer

What is the relation between a surrogate function and an acquisition function?

A surrogate function is a simpler function than the objective function to evaluate. An acquisition function is used to propose sampling points. In the context of Bayesian optimisation and Gaussian processes, what's the intuition behind the…
5
votes
1 answer

Bayesian hyperparameter optimization + cross-validation

I want to use Bayesian optimization to search a space of hyperparameters for a neural network model. My objective function for this optimization is validation set accuracy. In addition, I want to perform cross-validation such that I can get a good…
5
votes
1 answer

Why does Bayesian optimization work?

Bayesian optimization is used to optimize costly black-box functions. The idea is to use a surrogate model to model the black-box function and then an acquisition function is used to find the next point of evaluation. The goal is to get very close…
grok
  • 221
  • 1
  • 6
5
votes
1 answer

Why do you need a separate criterion (Sequential model-based global optimization) in hyperparameter tuning?

In the paper 'Algorithms for Hyper-Parameter Optimization' (pdf), where they explain the 'Sequential Model-based Global optimization method (SMBO)', the authors made a comment that, SMBO algorithms differ in what criterion they use to obtain the…
5
votes
1 answer

Optimizing a "black box" function: Linear Regression or Bayesian Optimization... what's the difference?

Goal: I have a function $f(x,y)=z$ (two variables for illustration only) which I know almost nothing about--it has a compact domain which I can determine, it is non-negative, and bounded above. My goal is to find the maximum value $f(x,y)$ takes on…
TravisJ
  • 310
  • 2
  • 11
1
2 3
10 11