Questions tagged [lagrange-multipliers]

The method of Lagrange multipliers finds critical points (including maxima and minima) of a differentiable function subject to differentiable constraints.

59 questions
19
votes
4 answers

The proof of equivalent formulas of ridge regression

I have read the most popular books in statistical learning 1- The elements of statistical learning. 2- An introduction to statistical learning. Both mention that ridge regression has two formulas that are equivalent. Is there an understandable…
jeza
  • 1,527
  • 2
  • 16
  • 37
14
votes
1 answer

LASSO relationship between $\lambda$ and $t$

My understanding of LASSO regression is that the regression coefficients are selected to solve the minimisation problem: $$\min_\beta \|y - X \beta\|_2^2 \ \\s.t. \|\beta\|_1 \leq t$$ In practice this is done using a Lagrange multiplier, making the…
13
votes
2 answers

KKT in a nutshell graphically

Objective Confirm if the understanding of KKT is correct or not. Seek for further explanation and confirmations on KKT. Background Trying to understand KKT conditions, especially the complementary one, which always pops up out of blue in SVM…
mon
  • 685
  • 5
  • 16
8
votes
1 answer

Why are the Lagrange multipliers sparse for SVMs?

I've read that for the Maximal Margin Classifier SVM, after solving the dual problem, most of the lagrange multipliers turn out to be zeros. Only the ones corresponding to the support vectors turn out to be positive. Why is that?
Michael Litvin
  • 285
  • 3
  • 7
5
votes
3 answers

SVM: Why alpha for non support vector is zero and why most vectors have zero alpha?

In the optimization problem in SVM to compute the margin, we use Lagrange multipliers to insert the constraint: $$L(w,b,\alpha)= \frac{1}{\lambda}|w| - \sum \alpha (y_i(w*x_i+b) -1)$$ Now we want to compute the $\alpha$. It is stated that $\alpha$…
Code Pope
  • 781
  • 6
  • 17
5
votes
1 answer

Lagrangian multiplier: role of the constraint sign

I am beginner learning Lagrange multipliers with wiki article. Consider: maximize $f(x,y)$ subject to $g(x,y) = 0$ I understand that to maximize I must follow the gradient $\nabla {_{x, y}}^{}f$. I also understand that gradient of the constraint…
4
votes
0 answers

How to prove the equivalence between constrained form and Lagrange form for lasso and ridge regression?

How to prove the equivalence between constrained form and Lagrange form for lasso and ridge regression? Given lasso (constrained form): $$\underset{\beta}{\min}{(\frac{1}{2N}||y-x\beta||_2^2)} \space subject \space to \space ||\beta||_1 \leq t$$…
FantasticAI
  • 417
  • 1
  • 4
  • 12
4
votes
1 answer

Gradient descent in SVM

I am studying about SVM now. Then I came across the problem. The dual optimization problem is as follows: \begin{align*} &\max_\alpha~~~~~ W(\alpha) = \sum_{i=1}^{n} \alpha_i -\frac{1}{2}\sum_{i,j=1}^{n} y_i y_j \alpha_i \alpha_j \langle x_i,…
yoyo
  • 979
  • 1
  • 6
  • 9
3
votes
0 answers

Maximum entropy probability distribution over non-negative support and finite mean?

I'm trying to derive which univariate probability distribution maximizes entropy, assuming finite mean $\mu$ and non-negative support $[0, \infty)$. I know that the answer is the exponential distribution, but I'm struggling to get there. I start by…
3
votes
1 answer

How to choose between dual gradient descent and the method of Lagrangian multipliers?

For an optimization problem $$ \max f(x)\\\ s.t. g(x)\le 0 $$ The Lagrangian is $$ \mathcal L(x, \lambda)=f(x)-\lambda g(x) $$ Dual gradient descent solves it by (according to Page 43 of this lecture, I modify the process for solving a maximization…
Maybe
  • 775
  • 7
  • 15
3
votes
0 answers

Support Vector Machines: a beginner's question about the underlying math

I'm new to Support Vector Machines and I've been trying to get into the underlying math (instead of just using Scikit Learn or something like that). I understand the math behind it up to the point where we derive this Lagrangian: $$L = \sum \alpha…
3
votes
1 answer

Using technique of Lagrange Multiplier

How can we use the technique of Lagrange multipliers to find a new vector of parameters $w$ which solves the optimization problem: minimize J(w) = $\frac{1}{2} || w -u ||^2$ such that: $w^T (x − y) ≥ 1$ Here $u$ is the current vector of parameters…
BhishanPoudel
  • 215
  • 2
  • 8
3
votes
0 answers

Scaling prediction from VAR model subject to a equality constraint

I have a forecasting problem and already built a decently working VAR model which provides forecasts as $\hat{Y}_{iT}$, for $i = 1,..n$ and $T$ is forecast time period. But now I have an additional constraint that the sum of my forecasts should add…
2
votes
1 answer

KKT Conditions for thresholds?

My main question is that when I use Lagrange Multipliers/KKT conditions to perform optimization with threshold constraints, I seem to get contradictory FOC. Here is a characteristic example: take an optimization problem like the…
2
votes
0 answers

Gradient descent finds local minima for a problem that can be formulated as a convex problem

I am trying to find $$ \min_W \|Y-XW \|_F^2$$ $$s.t. \exists ij, W_{ij}\geq0 $$ where X is input data and Y is the output data we try to fit to. This is a convex optimization problem that can be solved with quadratic programming. As an exercise, I…
1
2 3 4