I'm looking for a non-technical definition of the lasso and what it is used for.
-
From Robert Tibshirani's (the author of the original lasso paper) page: [A simple explanation of the Lasso and Least Angle Regression](http://www-stat.stanford.edu/~tibs/lasso/simple.html). – Oct 19 '11 at 04:35
3 Answers
The LASSO (Least Absolute Shrinkage and Selection Operator) is a regression method that involves penalizing the absolute size of the regression coefficients.
By penalizing (or equivalently constraining the sum of the absolute values of the estimates) you end up in a situation where some of the parameter estimates may be exactly zero. The larger the penalty applied, the further estimates are shrunk towards zero.
This is convenient when we want some automatic feature/variable selection, or when dealing with highly correlated predictors, where standard regression will usually have regression coefficients that are 'too large'.
https://web.stanford.edu/~hastie/ElemStatLearn/ (Free download) has a good description of the LASSO and related methods.

- 2,610
- 3
- 19
- 30
-
I'm new to the site; this is precisely the information I was looking for; many thanks. – Paul Vogt Oct 19 '11 at 06:13
-
-
-
"The larger the penalty applied, the further estimates are shrunk towards zero." Based on @Jbowman's answer in https://stats.stackexchange.com/questions/74542/why-does-the-lasso-provide-variable-selection, I think this quoted statement isn't true? There is a $\lambda_{threshold}$ to obtain a zero coefficient, but if you keep increasing $\lambda$ so that $\lambda > \lambda_{threshold}$, then the coefficient should deviate from zero? – roulette01 Jun 29 '20 at 17:08
In "normal" regression (OLS) the goal is to minimize the residual sum of squares (RSS) in order to estimate the coefficients
$$ \underset{\beta \in \mathbb{R}^p}{\operatorname{argmin}} \sum_{i=1}^{n} (Y_{i} - \sum_{j=1}^{p}X_{ij}\beta_{j})^{2} $$
In case of LASSO regression you estimate the coefficients with a slightly different approach:
$$ \underset{\beta \in \mathbb{R}^p}{\operatorname{argmin}} \sum_{i=1}^{n} (Y_{i} - \sum_{j=1}^{p}X_{ij}\beta_{j})^{2} \color{red}{+ \lambda \sum_{j=1}^{p}|\beta_{j}|} $$
The new part is highlitened in red, which is a sum of the absolute coefficient values penalized by $\lambda$, so $\lambda$ controls the amount of (L1) regulazation.
Note that if $\lambda = 0$, it would result into same coefficients as that of Simple Linear Regression. The formula shows that in case of LASSO $\operatorname{argmin}$ needs both, RSS and L1 regulazation (new red part) to be minimal. If $\lambda = 1$, the red L1 penalty constrains the size of the coefficients so that the coefficient can only increase if this lead to the same amount of decrease in RSS. More generally, the only way the coefficients can increase is if we experience a comparable decrease in the residual sum of squares (RSS). Thus, the higher you set $\lambda$ the more penalty is applied to the coefficients and the smaller will be the coefficients, some might become zero. That means LASSO can result in parsimonious models by doing feature selection and it prevents the model from overfitting. That said, you can use LASSO if you have many features and your goal is rather to predict data than to interpret the coefficients of your model.
-
1Thanks for your answer (+1). This site supports $\TeX$, could you post the formulas in $\TeX$? This would make them readable for visually impaired users. Notice that you can even use colours [as in here](https://stats.stackexchange.com/questions/195044/forecasting-if-the-next-number-is-higher-or-lower/195060#195060) (click "edit" to see raw answer) and underbraces [as in here](https://stats.stackexchange.com/a/183331/35989) for making similar figures. Thanks. – Tim May 09 '19 at 12:06
-
@Tim: Thank you very much for that! It was a great tip to click edit in order to see how it is done. – May 09 '19 at 12:27
LASSO regression is a type of regression analysis in which both variable selection and regulization occurs simultaneously. This method uses a penalty which affects they value of coefficients of regression. As penalty increases more coefficients are becomes zero and vice Versa. It uses L1 normalisation technique in which tuning parameter is used as amount of shrinkage. As tuning parameter increase then bias increases and as is decreases then variance increases. If it is constant then no coefficients are zero and as is tends to infinity then all the coefficients will be zero.

- 31
- 1