Minimal requirement for reaching a feasible solution in non-convex constrained gradient descent

Question

The problem is as follows:

$\max_x f(x) \enspace , \enspace \text{s.t.} \enspace g(x) \leq \alpha $

We can not assume that either $f$ nor $g$ are convex, on the contrary - we can assume they are strictly non-convex.

The functions are unknown, but we have access to the values and gradient (e.g. given an $x$ we know what $f(x), \nabla f(x)$ are). We solve this using gradient descent.

We can assume that:

$f$ and $g$ are bounded.
$f$ and $g$ are differentiable and Lipschitz.

The question is - what additional assumptions are required from $f$ and $g$ such that convergence to a feasible solution $g(x^*) \leq \alpha$ is ensured? I was thinking something along the lines of:

every local minima of $g$ is a feasible solution.

Is there any weaker assumption than (3) that may ensure convergence? are you aware of previous works which tackled such a problem (I believe someone has, yet I have yet to find it) and may have talked about this?

I am grateful for any help given!

These are far from minimal requirements. I suppose the answer must depend on exactly what you mean by a "gradient algorithm." For instance, what will your algorithm do upon encountering (or crossing) the boundary? And what exactly do you mean by "gradient (first order)" for a function that, although Lipschitz continuous, might fail to be differentiable? And surely you don't intend to impose any conditions on $g$ except in a neighborhood of the set $g(x)=\alpha,$ right? Likewise, is there any point to imposing requirements on $f$ outside the region $g(x)\le\alpha$? — whuber, Sep 21 '18 at 12:20
Sorry, I didn't phrase the problem properly, what I mean is that we can assume the functions are continuous, bounded. I assume we also require Lipschitz and something along the lines of 3. — Chen, Sep 21 '18 at 13:32
Please try to refine and narrow your question. As it stands it is very broad and admits many different answers. Could you tell us what problem in statistics or machine learning you are trying to solve? That might narrow the scope sufficiently. — whuber, Sep 21 '18 at 13:34
Refined the question. The question is in machine learning, given parameters $x$ of the network you know $f(x), \nabla f(x), g(x), \nabla g(x)$. A gradient descent algorithm for example is to use Lagrange multipliers method with a 2 timescale scheme ($\lambda$ is updated slowly as opposed to $x$). The question is essentially: what properties must $g$ possess in order to ensure that from any starting point $x_0$ the algorithm will converge to a feasible solution. — Chen, Sep 21 '18 at 13:46
I am puzzled why you are focused on $g,$ since the properties of $f$ are just as important. Are there perhaps typographical errors in your question, such as switching the roles of $f$ and $g$ in some places? — whuber, Sep 21 '18 at 14:24
@whuber, of course properties of $f$ are important; but my interest in $g$ is that common methods such as Lagrange multipliers can always take $\lambda$ to $\infty$ in which case it is equivalent to performing optimization only on $g$. This leads to the interest in $g$ and what properties are required to ensure that a feasible solution is found. — Chen, Sep 22 '18 at 07:46
I cannot fathom this. If you are truly interested in the constraint $g(x)\le\alpha,$ then it makes no sense to "take $\lambda$ to $\infty.$" Moreover, you have no interest in the behavior of $g$ (or $f$) in the region where $g(x)\gt\alpha+\epsilon$ (for arbitrarily small $\epsilon\gt 0$). In particular, if either $f$ or $g$ is unbounded in that region it makes no difference to the problem. — whuber, Sep 23 '18 at 15:52

Minimal requirement for reaching a feasible solution in non-convex constrained gradient descent

0 Answers0