Relation between the tuning parameter $\lambda$, parameter estimates $\beta_i$ and constraint $s$ in LASSO logistic regression

Question

In the context of LASSO logistic regression, I understand that $\lambda$ is the tuning parameter obtained by cross validation. There is also the constraint parameter $s$ ($\sum_{i=1}^p|\hat\beta_i|\le s $).

How this constraining parameter $s$ is chosen?
How are $\lambda$, $s$ and $\beta_i$ shrinking to zero related to each other?
What is the decision process or how are some $\hat\beta_i$ shrunk to zero and some are not?

You don't have to choose $\lambda$ by cross validation. You can specify it a-priori, eg. Cross-validation is just a common strategy when you don't already know the ideal $\lambda$ for your case. Note that there is a correspondence between $\lambda$ & $s$, so choosing a $\lambda$ implies choosing an $s$ and vice versa. — gung - Reinstate Monica, Nov 08 '14 at 17:56
@gung thanks gung, so $\lambda$ and s are the same, then how are some parameter estimates $\beta_i$ shrinked to zero while others are not ? — Tyrone Williams, Nov 08 '14 at 17:59
$\lambda$ & $s$ are *not* the same, there is simply a correspondence b/t them. Someone can give you a full answer explaining the LASSO. — gung - Reinstate Monica, Nov 08 '14 at 18:03
@gung okay cool, so what is the constraint parameter s ?, how do you chose s ? — Tyrone Williams, Nov 08 '14 at 18:08

score 5 · Accepted Answer · edited Dec 25 '14 at 03:56

Consider the original formulation of the Lasso regression problem in a linear regression setting, as following$$ \min_\beta \|y - X \beta\|_2^2 \ \\s.t. \|\beta\|_1 \leq s $$ To do the optimization, we utilize the Lagrange multiplier, and reformulate the problem as follows, $$ \min_\beta \|y - X \beta\|_2^2 + \lambda \|\beta\|_1 \ $$ From the two formulations, you can see the connection between $\lambda$ and $s$.

(1) as $s$ becomes infinity, the problem becomes unconstrained problem, or ordinary least squares. Thus $\lambda$ becomes 0 accordingly;

(2) as $s$ becomes 0, all $\beta$'s shrink to 0, easily seen from first formulation. Therefore $\lambda$ would go to infinity.

That said, $\lambda$ and $s$ have reverse relationship. Now for your questions.

How this constraining parameter $s$ is chosen?

In practice, you would just need to choose $\lambda$, mainly by cross-validation, as other people pointed out. You are not bothered by what the $s$ value would be.

How are $\lambda, \, s$, and $\hat{\beta}$ shrinking to zero related to each other?

Have answered by Point (2) I made above.

What is the decision process or how are some $\hat{\beta}$'s shrunk to zero and some are not?

This has to do with the L1 constraint. I highly recommend the geometric representation of this problem at P71 of the book The element of statistical learning. The L1 constraint makes the feasible region to be a diamond (in terms of two $\beta$'s, as in the figure). The corners of the region would be "hit" by the function of the residual SS, resulting in shrinking some $\beta$'s to be 0. That's how the sparsity comes from.

enter image description here

this it an amazing explanation. I will go through that book. Thanks again — Tyrone Williams, Nov 10 '14 at 02:26

Relation between the tuning parameter $\lambda$, parameter estimates $\beta_i$ and constraint $s$ in LASSO logistic regression

1 Answers1

Linked