Why do we want an objective function to be a convex function?

Question

I understand that a convex function is a great object function since a local minimum is the global minimum. However, there are non-convex functions that also carry this property. For example,

this figure shows a non-convex function that carries the above property.

It seems to me that, as long as the local minimum is the global minimum, we should be able to use gradient-decent (or similar methods) to find the global minimum. So, my question is, why do convex functions get more attention, compared to other functions that also carry the above property? Thanks!

[UPDATE] Thanks for many of the great comments and answers. I self-learned concepts of optimization and I admit many of my wording are imprecise. Do you have any suggested references for more practical aspects of convex optimization? Boyd's book is probably too intense for me.

Although not all functions with unique minima are convex, all convex functions have unique minima. Often one can check that a function is convex much, much more easily than whether it has a unique minimum. That alone merits the attention. In what sense, though, are *all* ML researchers interested "mostly" in convex functions? Do you have some support for this, or is it just a personal impression? — whuber, Jun 21 '16 at 03:03
@whuber: I modified my wording. I don't have support for "all ML researchers interested mostly in convex functions", and in fact, I didn't mean to let the readers have the impression because I also think that is a bold claim. All I can say is English is not my first language, and I probably spent too much my brain power in "grammatical-correctness" rather than the "semantical-correctness" when I typed the sentences. Thanks for your comment. — user1036719, Jun 21 '16 at 03:17
Note that the function you show actually has a related property: It is *quasiconvex* (its sublevel sets are convex). That said, convexity admits a rich calculus that, itself, yields many useful algorithms with verifiable optimality (sometimes referred to as *certificates of $\epsilon$-optimality*) through primal-dual relationships. As Rockafellar said: "(...)in fact, the great watershed in optimization isn't between linearity and nonlinearity, but convexity and nonconvexity". — cardinal, Jun 21 '16 at 05:23
A very minor quibble with @whuber's comment: He implicitly is referring to *strictly* convex functions. (Many convex functions have multiple minima, e.g, $f(x) = 0$.) — cardinal, Jun 21 '16 at 05:25
If you find Boyd's book too intense you might find Kelley's [Iterative Methods for Optimization](http://www.siam.org/books/textbooks/fr18_book.pdf) more palpable. Nocedal's and Wright's [Numerical Optimization](http://www.springer.com/us/book/9780387303031) is considered pretty standard too (but I don't know if it is free). These books do not focus on convex optimisation exclusively. — usεr11852, Jun 21 '16 at 06:00
Neural nets are perhaps best known case of ML model with non convex objective function, and currently it feels like all ML researchers are doing Neural nets! — seanv507, Jun 21 '16 at 06:05

usεr11852 · Answer 1 · 2016-06-21T04:55:02.403

First of all as @whuber mentioned many people are involved in non-convex optimisation. Having said, your definition of "as long as the local minimum is the global minimum" is rather... lax. See for instance the Easom function ($f(x,y) = -\cos(x)\cos(y)\exp(-(x-\pi)^2 - (y-\pi)^2)$. It has a single minimum if that is your concern but if your even remotely away from it you are... staffed. Standard gradient-based methods like BFGS (BFGS in R's optim )and Conjugate Gradient (CG in R's optim) will suffer greatly. You will have to essentially make an "educated guess" about your answer (eg. Simulated Annealing - SANN in R's optim), which a very computationally expensive routine.

In R:

easom <- function(x){ 
  -cos(x[1]) * cos(x[2]) * exp( -(x[1] -pi)^2 - (x[2] - pi)^2)
}
optim(easom,par=c(0,0), method='BFGS')$par # 1.664149e-06 1.664149e-06 # Junk
optim(easom,par=c(0,0), method='CG')$par # 0 0 # Insulting Junk
optim(easom,par=c(0,0), method='SANN')$par # 3.382556 2.052309 # Some success!

There are other even worse surfaces to optimise against. See for example Michalewicz's or Schwefel's functions where you might have multiple local minima and/or flat regions.

This flatness is a real problem. For example in generalised as well as standard linear mixed effects model as the number of estimation parameters increases, the log-likelihood function, even after profiling out the residual variance and the fixed-effects parameters can still be very flat. This will lead the model to converge on the boundary of the parameter space or simply to a suboptimal solution (this is actually one of the reason some people myself included are skeptical with the 'keep it maximal' idea for LME's). Therefore "how much convex" your objective function might have big impact to your model as well as your later inference.

+1 I am glad you finished with the thoughts about "how much convex," because I couldn't help thinking that even when a function is convex, finding an optimum numerically can be problematic. The thread at http://stats.stackexchange.com/questions/7308 presents an example of that in a real statistical application. (Do note that this function is not globally convex; but I believe it is convex within a neighborhood of the two "solutions" that were found numerically, which is what matters.) — whuber, Jun 21 '16 at 13:06

Why do we want an objective function to be a convex function?

1 Answers1

Linked