Suppose we have access to an outcome variable $Y_i$ and a $p$-dimensional vector $X_i$ for $i=1,\ldots,N$. We run a LASSO regression of $Y$ on $X$ for every penalty/shrinkage parameter $\lambda$ in an ascending order such that for the lowest $\lambda$ none of the $p$ coefficients are set to zero and for the highest $\lambda$ all of the $p$ coefficients are set to zero (this is essentially the LASSO path you can estimate using fx sklearn (sklearn.linear_model.lasso_path) in Python).
The question relates to how we can interpret the selection of variables as we increase the penalty parameter. Can we interpret excluded variables as those that correlate the least with $Y$?
- For instance on one hand, when only the coefficient for one variable is set to zero, is this the variable that correlates the least with $Y$?
- On the other hand, when all coefficients but one are equal to zero, will the remaining non-zero coefficient represent the variable that correlates the most with $Y$?