Will LASSO choose variables that are highly correlated with the outcome variable?

Question

Suppose we have access to an outcome variable $Y_i$ and a $p$-dimensional vector $X_i$ for $i=1,\ldots,N$. We run a LASSO regression of $Y$ on $X$ for every penalty/shrinkage parameter $\lambda$ in an ascending order such that for the lowest $\lambda$ none of the $p$ coefficients are set to zero and for the highest $\lambda$ all of the $p$ coefficients are set to zero (this is essentially the LASSO path you can estimate using fx sklearn (sklearn.linear_model.lasso_path) in Python).

The question relates to how we can interpret the selection of variables as we increase the penalty parameter. Can we interpret excluded variables as those that correlate the least with $Y$?

For instance on one hand, when only the coefficient for one variable is set to zero, is this the variable that correlates the least with $Y$?
On the other hand, when all coefficients but one are equal to zero, will the remaining non-zero coefficient represent the variable that correlates the most with $Y$?

I would say the ones that least contribute to the improvement of the LASSO penalty. If all the explanatory variables were orthogonal with each other then that would be a case we could bring correlation into the picture too. (+1 @Scortchi's thread though probably answers a big portion of this question) — usεr11852, May 01 '19 at 08:12
I'm not convinced that the other thread gives some intuition on whether the selected variables are correlated with the outcome variable. I like @usεr11852's comment on the "least contribute to the improvement of the LASSO penalty". Let's assume all covariates are orthogonal. Can you provide some more explanation on whether this interprets as correlation with the outcome variable and possibly how to show this mathematically? — adam, May 01 '19 at 11:30

Will LASSO choose variables that are highly correlated with the outcome variable?

0 Answers0