Which optimization algorithm to use for problems with many local optima and expensive goal function?

Question

I've got an optimization problem with these properties:

The goal function is not cheap to compute. It can be evaluated up to around 10^4 times in the optimization.
There's a lot of local optima.
High values are clustered around the maximum, so the problem is kind of convex.
The solutions are constrained to a known hyperbox.
The gradient is unknown, but intuitively, the function is smooth.
It's a function of up to 50 variables. The graphs below are examples of the 2D special case.

Nelder-Mead gets stuck in the local optima a lot. Above three variables, there's no chance of using brute force. I've looked into genetic algorithms, but they seem to require a lot of evaluations of the goal function.

What kind of algorithm should I be looking for?

enter image description here

Anna, this is potentially interesting here on the stats site--because as you know, many statistical procedures look for optima of functions like this--but can you draw an explicit connection with a statistical problem to convince the community not to vote to close your question as off-topic? — whuber, Jun 11 '12 at 18:49
@whuber: I actually thought questions on numeric optimization were on-topic. The goal function is a measure of different correlation matrices, but there is no clear statistical connection. Is there a better site to ask this question on? — Anna, Jun 11 '12 at 18:57
I am not aware of a better site, Anna. I believe the difficulty is that the field of optimization is far broader than stats or machine learning; it has its own literature, terminology, methods, and so on. Statisticians become acquainted with some optimization concepts and methods perforce, but it is rarely their specialty or primary interest. — whuber, Jun 11 '12 at 19:00
@Anna Have you tried using [Simulated annealing](http://en.wikipedia.org/wiki/Simulated_annealing)? — , Jun 11 '12 at 23:27
There is no Operations reasearch/Optimization's site beyond its beta (or earlier) phase in Stack Exchange. Mathematics might be the best place where to move this question though. — stackovergio, Jun 12 '12 at 01:38
http://scicomp.stackexchange.com has 84 posts on optimization so far, so I think it would be better than mathematics or here. — Rob Hyndman, Jun 12 '12 at 01:59
@RobHyndman I agree it would be the right place where to move the question, but it is still a beta and if you merely look at the number of question tagged as 'optimization', we have 96 here. — stackovergio, Jun 12 '12 at 02:19
@Procrastinator SA typically requires between $10^4$ and $10^5$ (or more) evaluations during its random searching before it even begins to "cool" near a reasonable optimum. It's great for functions whose *differences* can be very rapidly evaluated in terms of a small to medium-size proposed change in their arguments. (This can include functions that are expensive to evaluate *ab initio,* if one is both lucky and clever.) Otherwise, for expensive functions, SA may be one of the worst possible approaches. — whuber, Jun 12 '12 at 11:46
@whuber Thanks for the illustration. What are better alternatives in this case? — , Jun 12 '12 at 11:58
@Proc stackovergio has given a good reply (+1): scan the "Efficient Global Optimization" paper. It reports impressive results with as few as 20 evaluations in some (quasi-realistic) cases. It's hard to recommend anything specific for this question because we aren't told much about the nature of the function or its domain of definition. — whuber, Jun 12 '12 at 12:01
(A year later), which implementation of Nelder-Mead did you try ? Some have quite wrong convergence tests. 2) did you do restarts, i.e. after a (local) min look around at a larger scale ? 3) there are dozens of optimization methods in [R optim](http://cran.r-project.org/web/views/Optimization.html) alone -- more methods, more knobs, than reproducible test functions. If you could post your function or a cross-section of it, it would be easy to run various methods on it. — denis, May 23 '13 at 15:51
related: http://stats.stackexchange.com/questions/193306/optimization-when-cost-function-slow-to-evaluate — Sycorax, May 18 '16 at 23:49
Here is a [solution](https://github.com/paulknysh/blackbox) that you might find useful. Here is a [paper](http://arxiv.org/pdf/1605.00998.pdf) that describes the method. — , May 18 '16 at 23:46

stackovergio · Accepted Answer · 2012-06-12T05:59:37.437

In case of expensive functions without derivative, a useful abstract framework for optimization is

Compute the function in few points (it may be a regular grid or even not)
Repeat
    Interpolate data/fit into a stochastic model
    Validate the model through statistical tests
    Find the model’s maximum.
    If it is better that the best one you previously have got, update the maximum
    Put this point in the dataset

That should also fit well with the pseudo convexity you refered.

References here:

Efficient Global Optimization of Expensive Black-Box Functions

A Rigorous Framework for Optimization of Expensive Functions by Surrogates

Which optimization algorithm to use for problems with many local optima and expensive goal function?

1 Answers1