What is the relationship between the sum of squares of all weights and lambda in the ridge regression

Question

Currently I am reading chapter 8, regression.

And I feel quite confused about the following paragraph(see picture below), does it mean in ridge algorithm, the sum of all weights will be less than lambda?

As I see from the execution result, it is impossible. Assume that we set a small value for lamda, and weights will be nearly same as regular least squares regression. Such this constriaint will not be met.

So how should I understand the following sentence?

Possible duplicate of [Showing the Equivalence Between the $ {L}_{2} $ Norm Regularized Regression and $ {L}_{2} $ Norm Constrained Regression Using KKT](https://stats.stackexchange.com/questions/401212/showing-the-equivalence-between-the-l-2-norm-regularized-regression-and) You may also find it helpful to review https://stats.stackexchange.com/questions/220243/the-proof-of-shrinking-coefficients-using-ridge-regression-through-spectral-dec/220324#220324 — Sycorax, Apr 17 '19 at 13:26

Sycorax · Answer 1 · 2019-04-19T15:04:49.677

Your confusion stems from using the symbol $\lambda$ in two different ways.

In the image that you shared, the symbol $\lambda$ expresses a constraint of the sum of squares of the coefficients. The optimization program is $$ \min_\beta \text{[Some loss function of $\beta$]}\\ \text{s.t.} \sum_i\beta_i^2\le \lambda $$

Equivalently, you can represent a constraint on the sum of squares of the coefficients as an unconstrained optimization problem with a penalty.

$$ \min_\beta \text{[Some loss function of $\beta$]}+\gamma\| \beta\|_2^2 $$

The important detail is that the unconstrained problem doesn't use $\lambda$; it uses another symbol which I've chosen to be $\gamma$. (The loss could be mean-square error, or binomial cross-entropy, or any other expression you seek to minimize in $\beta$.)

The equivalence between these two expressions is established in Showing the Equivalence Between the $ {L}_{2} $ Norm Regularized Regression and $ {L}_{2} $ Norm Constrained Regression Using KKT

What is the relationship between the sum of squares of all weights and lambda in the ridge regression

1 Answers1