0

We know that the modified LARS is used to implement LASSO. However what's the corresponding lambda? As there is no lambda parameter in LARS.

I found reference here:

LASSO regularisation parameter from LARS algorithm

which said:

At each iteration , the former algorithm finds an optimal couple $(\beta^*, \lambda^*)$ minimising the regularised loss function: \begin{align} (\beta^*, \lambda^*) &= \text{argmin}_{(\beta,\lambda)} L(\beta,\lambda) =\text{argmin}_{(\beta,\lambda)}\Vert y-X\beta \Vert_2^2 + \lambda \Vert \beta \Vert_1& \end{align}

But I think $\lambda$ minimizing loss function should be 0?

And another reference

LASSO: Deriving the smallest lambda at which all coefficient are zero

said the descent direction in LARS is the direction minizine the ratio of change between square loss and $L_1$ norm loss: $$\dfrac{\nabla_{\vec{s}}||y-X\beta||_2^2}{\nabla_{\vec{s}}||\beta||_1}.$$ From this, I guess $\lambda = 1$ in LARS?

Edit

Here is my understanding of LARS:

  1. We will go through the 'angular bisector' of the features in active set until appearing the new feature having smaller angle with the residual.

  2. Then we put this feature into active set. When there is no 'angular bisector' (min(number of sample. number of features)) or the residual is vertical to the feature space, LARS stops.

  3. We sum the corresponding step lengths from each step of each feature $\beta_i^*$ to get the final solution $\beta^*.$

For entire solution of pairs, does it mean whenever before a new feature added into the active set, the current solution $\beta^*_t$ corresponds a LASSO solution whose regularization coefficient $\lambda^*_t.$ Namely as the algorithm goes on, the corresponding LASSO $\lambda^*_t$ gets smaller and smaller and more and more features are no longer zero.

user6703592
  • 745
  • 3
  • 8
  • LARS returns the entire solution path, ie all $(\lambda, \hat\beta_\lambda)$ pairs – user551504 Aug 22 '21 at 04:57
  • @user551504 sorry don't understand. There is only an unique return of LARS, why can it return all pair? We never specify the value of $\lambda$ in the whole process of LARS. – user6703592 Aug 22 '21 at 06:43
  • Yes, a value of $\lambda$ is never specified and in fact the whole solution path is given. What makes you say "there is only an unique return"? – user551504 Aug 22 '21 at 14:15
  • @user551504 I mean finally LARS gives a solution $\beta^*,$ and I guess it should correspond a solution of LASSO whose regularization coefficient $\lambda^*,$ right? Then what's this $\lambda^*?$ The final solution $\beta*, \lambda*$ should be unique, why do you say there are lots of pairs? – user6703592 Aug 22 '21 at 15:41
  • @user551504 I kind of get your point, pls see my update. Am I correct? – user6703592 Aug 22 '21 at 17:12

1 Answers1

1

(Modified) LARS gives a sequence of coefficient estimates, call it $(\hat\beta_1,\hat\beta_2,\hat\beta_3,\dots\hat\beta_k)$. Lasso gives a path $(\tilde\beta_\lambda,\lambda)$ containing the optimal $\tilde\beta_\lambda$ for any $\lambda$, or equivalently, a path $(t,\tilde\beta_t)$ where $\tilde\beta_t$ minimises the residual sum of squares subject to $\|\tilde\beta_t\|_1\leq t$

The important claim about LARS and lasso is that

  • every $\hat\beta_i$ is also a $\tilde\beta_t$ for some decreasing sequence $\lambda_i$
  • The path $(t,\tilde\beta_t)$ is linear between the values picked out by (modified) LARS

If you want to know which $\hat\beta_i$ corresponds to a given $t$ it's easy: $t=\|\hat\beta_i\|_1$. If you want to know what $\lambda$ corresponds to a given $\hat\beta_i$, you can work it out because of the soft-thresholding property: the elements of $(X^TX\hat\beta_i-X^TXy)$ where $\hat\beta_i$ is not zero are equal to $\lambda/2$ in absolute value.

Thomas Lumley
  • 21,784
  • 1
  • 22
  • 73