Optimizing black box functions without Bayesian Optimization

Question

I have a black box system which accepts a data instance $\boldsymbol{x}$ and outputs a corresponding $y$. I don not know the exact mathematical formula of the system. What I want to do is maximizing $y$ by tweaking $\boldsymbol{x}$. Since I don't have the exact formula of the system, I cannot use gradient descent to do this. One idea in my mind is to add random noise to $\boldsymbol{x}$ which results in $\hat{\boldsymbol{x}}$, and then sample corresponding $\hat{y}$ from the system. By doing this I could use Bayesian Optimizaton to find an $\hat{\boldsymbol{x}}$ which maximizes $\hat{y}$.

My question is: Is there any other methods to tackle this problem? Can you point me out any paper/book/tutorial?

Thank you

Some of the content of this thread will apply (admittedly most of it is about BO). http://stats.stackexchange.com/questions/193306/optimization-when-cost-function-slow-to-evaluate/193310#193310 — Sycorax, Mar 14 '17 at 17:02
Also, the dirt-simple approach of random search has some nice properties. I can't find the specific paper that compares it to BO, but such a paper does exist. If I find it, I'll post an answer. — Sycorax, Mar 14 '17 at 17:09
@Tavrock Thanks for your comment. Can you clarify what a uniform set of $x$ is? Does that mean I should add white gaussian noise to generate such $x$? — MarsPlus, Mar 14 '17 at 17:11
@Tavrock That's essentially what BO does, but it tries to choose the points $x$ a little more efficiently than just a uniform grid, since black box evaluations are often expensive. — Danica, Mar 14 '17 at 17:25
Oh I see. I am sorry I forgot to mention that the input $\boldsymbol{x}$ is not a scalar. It is a vector of real-valued features. — MarsPlus, Mar 14 '17 at 17:25
Knowing that you are dealing with $Y=f (\vec {x}) $ verses $Y=f (x) $ is a huge difference. What is the size of your input vector? — Tavrock, Mar 14 '17 at 17:37
Oh, duh! And PSO can compare nicely to BO. See: http://stats.stackexchange.com/questions/194056/advantages-of-particle-swarm-optimization-over-bayesian-optimization-for-hyperpa?rq=1 — Sycorax, Mar 14 '17 at 17:45
@Tavrock In my case, the size of the input vector generally does not exceed 20. — MarsPlus, Mar 14 '17 at 17:49

jagdish · Answer 1 · 2017-03-14T22:39:10.667

Derivative-free optimization methods solve these type of problems where you can view your objective function as a black box. Bayesian optimization is one type of derivative-free method. It helps if you know any other structural information about your objective function. E.g., can you assume convexity? Surrogate models can used to estimate approximate gradient, e.g., take a few points, fit a quadratic function, and use the model to do a local search in the direction of its negative gradient. There is a vast amount of literature on derivative-free methods (for a review of algorithms, see http://thales.cheme.cmu.edu/dfo/comparison/dfo.pdf). Another example of a derivative-free method is the Nelder-Mead https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method .

Of course, because very little information is known about the objective function, it is difficult to solve problems with high input dimension (compared to regular optimization problems) -- you would need to consider the number of function evaluations you can afford and the size of your input variable you're optimizing.

Optimizing black box functions without Bayesian Optimization

1 Answers1