The method of Lagrange multipliers finds critical points (including maxima and minima) of a differentiable function subject to differentiable constraints.
Questions tagged [lagrange-multipliers]
59 questions
19
votes
4 answers
The proof of equivalent formulas of ridge regression
I have read the most popular books in statistical learning
1- The elements of statistical learning.
2- An introduction to statistical learning.
Both mention that ridge regression has two formulas that are equivalent.
Is there an understandable…

jeza
- 1,527
- 2
- 16
- 37
14
votes
1 answer
LASSO relationship between $\lambda$ and $t$
My understanding of LASSO regression is that the regression coefficients are selected to solve the minimisation problem:
$$\min_\beta \|y - X \beta\|_2^2 \ \\s.t. \|\beta\|_1 \leq t$$
In practice this is done using a Lagrange multiplier, making the…

ConstantAmateur
- 191
- 4
13
votes
2 answers
KKT in a nutshell graphically
Objective
Confirm if the understanding of KKT is correct or not. Seek for further explanation and confirmations on KKT.
Background
Trying to understand KKT conditions, especially the complementary one, which always pops up out of blue in SVM…

mon
- 685
- 5
- 16
8
votes
1 answer
Why are the Lagrange multipliers sparse for SVMs?
I've read that for the Maximal Margin Classifier SVM, after solving the dual problem, most of the lagrange multipliers turn out to be zeros. Only the ones corresponding to the support vectors turn out to be positive.
Why is that?

Michael Litvin
- 285
- 3
- 7
5
votes
3 answers
SVM: Why alpha for non support vector is zero and why most vectors have zero alpha?
In the optimization problem in SVM to compute the margin, we use Lagrange multipliers to insert the constraint:
$$L(w,b,\alpha)= \frac{1}{\lambda}|w| - \sum \alpha (y_i(w*x_i+b) -1)$$
Now we want to compute the $\alpha$. It is stated that $\alpha$…

Code Pope
- 781
- 6
- 17
5
votes
1 answer
Lagrangian multiplier: role of the constraint sign
I am beginner learning Lagrange multipliers with wiki article.
Consider:
maximize $f(x,y)$ subject to $g(x,y) = 0$
I understand that to maximize I must follow the gradient $\nabla {_{x, y}}^{}f$. I also understand that gradient of the constraint…

Marat Zakirov
- 483
- 5
- 14
4
votes
0 answers
How to prove the equivalence between constrained form and Lagrange form for lasso and ridge regression?
How to prove the equivalence between constrained form and Lagrange form for lasso and ridge regression?
Given lasso (constrained form):
$$\underset{\beta}{\min}{(\frac{1}{2N}||y-x\beta||_2^2)} \space subject \space to \space ||\beta||_1 \leq t$$…

FantasticAI
- 417
- 1
- 4
- 12
4
votes
1 answer
Gradient descent in SVM
I am studying about SVM now. Then I came across the problem.
The dual optimization problem is as follows:
\begin{align*}
&\max_\alpha~~~~~ W(\alpha) = \sum_{i=1}^{n} \alpha_i -\frac{1}{2}\sum_{i,j=1}^{n} y_i y_j \alpha_i \alpha_j \langle x_i,…

yoyo
- 979
- 1
- 6
- 9
3
votes
0 answers
Maximum entropy probability distribution over non-negative support and finite mean?
I'm trying to derive which univariate probability distribution maximizes entropy, assuming finite mean $\mu$ and non-negative support $[0, \infty)$. I know that the answer is the exponential distribution, but I'm struggling to get there.
I start by…

Rylan Schaeffer
- 689
- 3
- 10
3
votes
1 answer
How to choose between dual gradient descent and the method of Lagrangian multipliers?
For an optimization problem
$$
\max f(x)\\\
s.t. g(x)\le 0
$$
The Lagrangian is
$$
\mathcal L(x, \lambda)=f(x)-\lambda g(x)
$$
Dual gradient descent solves it by (according to Page 43 of this lecture, I modify the process for solving a maximization…

Maybe
- 775
- 7
- 15
3
votes
0 answers
Support Vector Machines: a beginner's question about the underlying math
I'm new to Support Vector Machines and I've been trying to get into the underlying math (instead of just using Scikit Learn or something like that).
I understand the math behind it up to the point where we derive this Lagrangian:
$$L = \sum \alpha…

Metrician
- 129
- 9
3
votes
1 answer
Using technique of Lagrange Multiplier
How can we use the technique of Lagrange multipliers to find a new
vector of parameters $w$ which solves the optimization problem:
minimize
J(w) = $\frac{1}{2} || w -u ||^2$
such that:
$w^T (x − y) ≥ 1$
Here $u$ is the current vector of parameters…

BhishanPoudel
- 215
- 2
- 8
3
votes
0 answers
Scaling prediction from VAR model subject to a equality constraint
I have a forecasting problem and already built a decently working VAR model which provides forecasts as $\hat{Y}_{iT}$, for $i = 1,..n$ and $T$ is forecast time period.
But now I have an additional constraint that the sum of my forecasts should add…

ab90hi
- 525
- 3
- 11
2
votes
1 answer
KKT Conditions for thresholds?
My main question is that when I use Lagrange Multipliers/KKT conditions to perform optimization with threshold constraints, I seem to get contradictory FOC.
Here is a characteristic example:
take an optimization problem like the…

naveace
- 23
- 3
2
votes
0 answers
Gradient descent finds local minima for a problem that can be formulated as a convex problem
I am trying to find
$$ \min_W \|Y-XW \|_F^2$$ $$s.t. \exists ij, W_{ij}\geq0 $$
where X is input data and Y is the output data we try to fit to. This is a convex optimization problem that can be solved with quadratic programming.
As an exercise, I…

CWC
- 251
- 1
- 5