Constrained optimization with gradient descent

Question

Suppose I want to maximize the likelihood $L(\theta_1, \theta_2)$ for some constraint for example $\theta_1 + \theta_2 = 1$ and no other constraints

Can I just replace $\theta_2$ by $1 - \theta_1$ in the likelihood then do gradient descent on $\theta_1$. If I can, or cannot, why?
Can I set up an objective function with Lagrange multiplier $\mathcal{L} = L(\theta_1, \theta_2) + \lambda (\theta_1 + \theta_2 - 1)$ and do a gradient descent algorithm on $\mathcal{L}$? If I can, or cannot, why?
Do I only can rely on projected gradient descent if I want to solve this constrained optimization problem using gradient descent?

EDIT: I tried all 3 options and maybe my likelihood function is not "regular" and only option 3 works :'( I would like to know why and when options 1 and 2 work.

Thank you very much in advance for all the help.

The first option is fine. The second option, you'll need an absolute value in in there. — Arya McCarthy, Jun 18 '21 at 02:31
@AryaMcCarthy Thanks for your comment. I wonder where I need an absolute value. Btw I tried all 3 options and maybe my likelihood function is not "regular" and only option 3 works :'( I would like to know why and when options 1 and 2 work. — wut, Jun 18 '21 at 02:45

score 0 · Answer 1 · answered Jun 18 '21 at 02:41

The first option is still constrained as $\theta_1$ still has to lie between $(0,1)$

You can look at the following reparametrization to convert the constrained problem into a truly unconstrained optimization:

Let $\log \theta_1 = \alpha_1 - \log (e^{\alpha_1}+e^{\alpha_2})$ and $\log \theta_2 = \alpha_2 - \log (e^{\alpha_1}+e^{\alpha_2})$. As you can notice, that this reparametrization still preserves the constraint as $\theta_1 = \frac{e^{\alpha_1}}{e^{\alpha_1}+e^{\alpha_2}}$ and $\theta_2 = \frac{e^{\alpha_2}}{e^{\alpha_1}+e^{\alpha_2}}$. Here, $\alpha_1,\alpha_2$ have no constraints.

This is called the log-sum-exp trick and one place where I know it has been used was the gradient based estimation of Gaussian Mixture models.

Thanks for your answer and helpful reference. But I think I mentioned that there is no other constraint :'( — wut, Jun 18 '21 at 02:47
The problem formulation didn't say that $\theta_1$ has to be greater than 0. — Arya McCarthy, Jun 18 '21 at 02:54

Constrained optimization with gradient descent

1 Answers1