Finding the maximum of a function $f(x)$ without analytically evaluating $f'(x)$

Question

I'm an experimental physicist, trying to automate a relatively simple (but sensitive) optimization in my measurements that is currently done completely manually and takes up a lot of my time. I figure that machine learning should have ample tools for what I need to do, but my knowledge is quite limited and Google isn't helping me thus far. I've taken away all of the physics/measurement related details and distilled the following scenario, which I hope is general enough.

There is some function $f(x)$ that I want to maximize. However, I can only evaluate $f(x)$; I cannot evaluate its derivative explicitly. Moreover, I cannot sample a large range of $x$; if $f(x) < {\rm threshold}$, I am in trouble (and it takes me over a day to recover). Luckily, I have one starting value $x_0$ for which I know that that $f(x_0) > {\rm threshold}$ even before evaluating, and I can guess some initial step size $\varepsilon$ for which $f(x_0 + \varepsilon) > {\rm threshold}$ will also hold (however, I don't know if $f(x_0 + \varepsilon) >$ or $< f(x_0)$ before evaluating). In addition to that I know that the function is roughly parabolic. Could someone suggest an algorithmic / adaptive / feedback protocol to find the $x$ that maximizes $f(x)$ up to some tolerance $x_{tol}$? Note that evaluating $f(x)$ is somewhat costly, but that is not my primary concern at this stage.

So far I've found the golden-section search, but that requires choosing some range $(a,b)$ over which you want to maximize which I cannot do; I can't start from some wide range as that might bring me below my tolerance, and I can't figure out #x_T$, the $x$ for which I go below my tolerance, without actually evaluating. Another option would be something like a gradient descent using finite difference for the derivative, but that sounds quite inefficient?

For context, how I currently do it manually is as follows: I evaluate $f(x_0)$ and then $f(x_0 + \varepsilon)$. If this leads to a decrease, I evaluate $f(x_0-\varepsilon)$ instead. Based on the gradient (essentially I just look at if there are large changes in $f(x)$ w.r.t its distance from the threshold I cannot cross), I either increase or decrease $\varepsilon$ and continue searching in the same direction until a maximum is found, which I notice because $f(x)$ starts decreasing. I then go back to that maximum. This way I am always probing the top part of the maximum and thus remain in a safe range.

The solutions here area all relevant https://stats.stackexchange.com/questions/193306/optimization-when-cost-function-slow-to-evaluate/193310#193310 — Sycorax, Sep 23 '18 at 20:20
Is there a problem with numerically computing an approximation of the gradient $\frac{f(x+\epsilon)−f(x)}{\epsilon}$ (besides risk of evaluating below threshold)? Is f(x) smooth? (If not, you can still use sub-gradient.) Convex? Continuous? A complete mess? — Matthew Gunn, Sep 23 '18 at 21:14
It's approximately a parabola, which should indeed help. I'll add that to the main post. The problem with numerically computing the gradient is only that it's slow and probably not very accurate, I figured fancier methods might be out there, but indeed it can be done. — user129412, Sep 24 '18 at 12:39
I based my idea of it being a bad idea on the discussion in https://www.reddit.com/r/MachineLearning/comments/2e8797/gradient_descent_without_a_derivative/ which I perhaps should not take as gospel. — user129412, Sep 24 '18 at 12:46

score 2 · Answer 1 · answered Sep 23 '18 at 20:19

What you're looking for is a derivative-free optimization method, and what you're describing as your current method is basically a pattern search. In one dimension (it sounds like you're x is one-dimensional but correct me if I'm wrong) a pattern search is pretty efficient and I don't think there is really any way to do it better. If your concern is that you want to implement this in programming code to automate it, then that's definitely possible (and you'll be able to find implementations of pattern search in many programming languages), but it will not (I think) be more sophisticated than what you came up with yourself.

That's great! I'll have a look at this for sure. – user129412 Sep 24 '18 at 12:40 — user129412, Sep 24 '18 at 12:40

Finding the maximum of a function $f(x)$ without analytically evaluating $f'(x)$

1 Answers1