Maximum Prediction in Gaussian Process

Question

A Gaussian process (GP) is defined as a collection of random variables with a joint Gaussian distribution (Rasmussen 2006). It is well known that given observations $\left \{ \mathbf{x},\mathbf{y}\right\}$ and given a covariance function such as the squared exponential $$k(x,x^{'})= \sigma exp(-(x-x^{'})^{2}/l^{2})$$ then for any new input point $x^{*}$ , the response $y^{*}$ can be calculated based on the multivaraite normal theory as follows $$\\\begin{pmatrix} \mathbf{y}\\y^{*} \end{pmatrix} \sim N(\mathbf{0},\begin{bmatrix} K & K_{*}^{T} \\ K_{*} & K_{**} \end{bmatrix})\\y^{*}|\mathbf{y} \sim N(K_{*}K^{-1}\mathbf{y},K_{**}-K_{*}K^{-1}K_{*}^{T}) $$ I want to find the input points relating to the maximum and minimum prediction response over a domain of interest $x\in D$ , for instance in the figure below I want to find the input points $x^{*}=a,x^{*}=b$

What are the best and most efficient techniques that can be utilizaed for the gaussain process. Specifically, I am looking for possibly utilizing derivative information to provide a fast algortihm to find the inducing points $a,b$. I would also appreciate help in cases were $x^{*} \in \mathbb{R}^{d}$. Some good references on this topic would be of great help.

PS: The intent here is to estimate the input points which would cause a decrease or increase in the response

Very, very, very closely related: http://stats.stackexchange.com/questions/193306/optimization-when-cost-function-slow-to-evaluate/193310#193310 — Sycorax, Aug 10 '16 at 02:09
Are you able to go back in time and record historical values? Because aside that you just fit your model (GPR or anything else) and simply look at the maximum and minimum values it attains in your domain of interest. Otherwise you are having a question about optimal design really and not about GPs. — usεr11852, Aug 10 '16 at 07:32
@usεr11852 I cannot collect mnore historical data, I am searching for an efficient optimization algorithm which helps me find a and b especially in a high dimensional space. A full searcjh algorithm is not an option here — Wis, Aug 11 '16 at 21:42
@raK1: So you just want to fit "something" and then read out the extrema values of that estimated fit right? How large is your $d$ by the way? Usually cubic splines for examples are fairly fast to evaluate and can handle low ($d \leq 3$) fairly well. Otherwise I see not real question, you have decided on your technique (GPs), just fit it the model and move forward with it. You don't say what you have tried yet so it is difficult to contextualise the issue you experience. — usεr11852, Aug 11 '16 at 22:18
@usεr11852 My question is not about what technique to fit, I am using a Gaussian process. My question is about eficient optimization techniques that are fit for GP models and can efficiently evaluate extrema values. — Wis, Aug 11 '16 at 22:48
@raK1: Fitting a Gaussian process and finding it's extrema are two different tasks. In the first task a simple BFGS will suffice at first instance. Don't complicate things unnecessarily. First try it, see if the standard procedure is good enough and then expand on it (you still haven't said how big $d$ is by the way). In the second task you need to evaluate a model multiple time and it is effectively a programming task rather than a statistics task. — usεr11852, Aug 12 '16 at 08:20
You need to define a scalar "response" to be your optimization objective function. How are you taking the multidimensional predicted mean M(x), and covaiance C(x), at a point x in $R^n$ and producing a scalar objective function at x? That is they key thing you need to decide on if you are going to do optimization. Even in 1D it's not clear whether your picture is supposed to be based just on the mean and ignore covariance (variance in 1D). — Mark L. Stone, Aug 23 '16 at 02:01
@MarkL.Stone The objective function is the likelihood function which is optimized in order to find the hyperparameter estimates. The response y is indeed scalar for any input vector x. In all cases when the dimension of x increases it is like having many covariates. My question is related to an efficient optimizer that is able to find x corresponding to a local maximum of y — Wis, Aug 23 '16 at 02:08
What are your hyperparameters? Can you explicitly show your objective function? Can you calculate its gradient with respect to the parameters being optimized? How big is the number of optimization variables (hyperparameters)? — Mark L. Stone, Aug 23 '16 at 02:38
@MarkL.Stone The hyperparameters are $\sigma$ and $l$ in first equation above. Maximizing the normal likelihood is simple in this case. However, the goal here is to find the inducing points after having the hyperparameter estimates — Wis, Aug 23 '16 at 06:58

Maximum Prediction in Gaussian Process

0 Answers0