3

Currently I am reading chapter 8, regression.

And I feel quite confused about the following paragraph(see picture below), does it mean in ridge algorithm, the sum of all weights will be less than lambda?

As I see from the execution result, it is impossible. Assume that we set a small value for lamda, and weights will be nearly same as regular least squares regression. Such this constriaint will not be met.

So how should I understand the following sentence?

enter image description here

novice
  • 31
  • 1
  • 4
    Possible duplicate of [Showing the Equivalence Between the $ {L}_{2} $ Norm Regularized Regression and $ {L}_{2} $ Norm Constrained Regression Using KKT](https://stats.stackexchange.com/questions/401212/showing-the-equivalence-between-the-l-2-norm-regularized-regression-and) You may also find it helpful to review https://stats.stackexchange.com/questions/220243/the-proof-of-shrinking-coefficients-using-ridge-regression-through-spectral-dec/220324#220324 – Sycorax Apr 17 '19 at 13:26

1 Answers1

2

Your confusion stems from using the symbol $\lambda$ in two different ways.

In the image that you shared, the symbol $\lambda$ expresses a constraint of the sum of squares of the coefficients. The optimization program is $$ \min_\beta \text{[Some loss function of $\beta$]}\\ \text{s.t.} \sum_i\beta_i^2\le \lambda $$

Equivalently, you can represent a constraint on the sum of squares of the coefficients as an unconstrained optimization problem with a penalty.

$$ \min_\beta \text{[Some loss function of $\beta$]}+\gamma\| \beta\|_2^2 $$

The important detail is that the unconstrained problem doesn't use $\lambda$; it uses another symbol which I've chosen to be $\gamma$. (The loss could be mean-square error, or binomial cross-entropy, or any other expression you seek to minimize in $\beta$.)

When $\gamma$ is small, $\|\beta \|_2$ will be larger than when $\gamma$ is large. The quantity $\| \beta \|_2$ is shrinks when the penalty $\gamma$ is larger. This is the opposite relation compared to the constrained optimization using $\lambda$: when $\lambda$ is large, $\|\beta\|_2$ will be large, and when $\lambda$ is small, $\|\beta\|_2$ will be small.

The equivalence between these two expressions is established in Showing the Equivalence Between the $ {L}_{2} $ Norm Regularized Regression and $ {L}_{2} $ Norm Constrained Regression Using KKT

Sycorax
  • 76,417
  • 20
  • 189
  • 313