Who first proposed Bayesian optimisation with Gaussian processes?

Question

From what I understand, the 'standard' approach to Bayesian Optimisation uses a Gaussian process for the prior (as opposed to more recent proposals like TPE or Bayesian Optimisation with random forests; please correct me if any of this is wrong).

However, I'm having a hard time finding out who first proposed this. My statistical background is very limited, so I find it difficult to evaluate to which extent old papers use essentially the same approach, especially when they use different terminology.

A couple of candidates are:

Kushner 1962
A Versatile Stochastic Model of a Function of Unknown and Time Varying Form
Journal of Mathematical Analysis and Applications

Kushner 1964
A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise
Journal of Basic Engineering

Močkus 1975
On Bayesian methods for seeking the extremum
Optimization Techniques IFIP Technical Conference

Žilinskas 1978
On statistical models for multimodal optimization
Series Statistics

O'Hagan 1978
Curve Fitting and Optimal Design for Prediction
Journal of the Royal Statistical Society. Series B (Methodological)

As well as earlier and later works by Močkus and Žilinskas, some of it in Russian.

In addition, there's Krige and the Kriging literature, which as far as I can tell were not concerned with optimisation?

Of these, only O'Hagan mentions Gaussian processes explicitly, Kushner and Žilinskas discuss Gaussian random variables, functions and fields, but I'm not sure whether that's the same as a Gaussian process. (Wikipedia says that Gaussian processes and one-dimensional Gaussian random fields are the same thing.)

My impression is that the earliest forms of Bayesian optimisation modelled the prior distribution with Wiener processes/Brownian motion. According to Wikipedia, Wiener processes are Gaussian processes, so does that mean that this is simply a stronger assumption, that later proved unnecessary? Is the choice between Wiener processes and Gaussian processes significant for the performance of Bayesian optimisation, or was the switch motivated more by practical/pragmatic reasons?

A more specific form of my question is: when I use Bayesian optimisation with Gaussian processes in a modern software package like GPyOpt or Emukit, what's the literature underpinning that?

score 0 · Answer 1 · answered Feb 22 '21 at 00:19

Kushner (1964) (H. J. Kushner. A new method of locating the maximum of an arbitrary multipeak curve in the presence of noise.J. Basic Engineering, 86:97–106,1964.) is probably the first one with PI acquisition function.

And Bayesian optimization builds on the top of the Gaussian process regression (GPR). Note that the Gaussian process and Gaussian process regression are similar but distinct concepts.

Who first proposed Bayesian optimisation with Gaussian processes?

1 Answers1