1

I'm learning machine learning by myself and what confused me is I can't get the point if my data contains M features vectors, how would I use lasso regression to select the most relevant m < M features for this model?

Any response will be appreciated

user207175
  • 13
  • 2
  • 1
    For any penalty parameter, lasso will estimate $m \le M$ nonzero coefficients. Are you asking about how the optimization works? How to use software? Something else? – Sycorax May 05 '18 at 21:12
  • Just theoretically how lasso works. Would you mind giving me more details about how does lasso select the most relevant features m? – user207175 May 05 '18 at 21:30
  • Possible duplicate: https://stats.stackexchange.com/questions/45643/why-l1-norm-for-sparse-models/45644#45644 – Sycorax May 05 '18 at 21:47

1 Answers1

1

In order to select features with the lasso, you can observe the drop out of coefficients at various shrinkages. At no shrinkage (0 penalty parameter), usually all M coefficients are non-zero, and at high enough shrinkage all coefficients are zero. In between these shrinkages, there will be points where individual coefficients drop out (leaving the top m features in the model). This is known as a regularization path.

You can visualize it in R using glmnet, using the iris dataset here:

library(glmnet)
data(iris)

X <- model.matrix(~ Sepal.Width + Petal.Length + Petal.Width, iris)
y <- iris$Sepal.Length
m_reg <- cv.glmnet(X, y, alpha = 1)
plot(m_reg$glmnet.fit, label = TRUE)

iris regularization path

In predicting Sepal.Length, the first feature to drop out of the model (go to 0) due to shrinkage is Petal.Width (label 4); the last feature to drop out is Petal.Length (label 3). If you wanted the most parsimonious model with m = 2 features, the lasso gives Sepal.Length ~ Sepal.Width + Petal.Length

khol
  • 777
  • 7
  • 13