When would I choose Lasso over Elastic Net

Question

What are the scenarios where Lasso is likely to perform better than Elastic Net (out of sample prediction)?

Maybe one approach is to ask the opposite question: When does the elastic net beat the lasso by a lot? In general, if there is a group of highly correlated features, then the lasso will just pick one of the features and ignore the others. The elastic net, conversely, will tend to use all these features a little bit; this can be desirable from a generalization point of view. On the other hand, if your design matrix X is close to orthogonal, then there may be less reason to use an elastic net over the lasso. — Stefan Wager, Oct 29 '13 at 07:26
@StefanWager Agreed. However my question is specified as it is intentionally -- all sources I read cover the relative benefits of Elastic Net but never the inverse. — user2763361, Oct 29 '13 at 07:30
@StefanWager Under what conditions will Elastic Net tend to push coefficients to 0.00000000 (like Lasso) versus the Ridge-like solution of 0.0019293? — user2763361, Oct 29 '13 at 07:36
The elastic net tends to give sparse solutions, just like the lasso. As you push the "alpha" parameter in glmnet towards 1, the elastic net becomes indistinguishable from the lasso. — Stefan Wager, Oct 29 '13 at 19:22
@StefanWager What is the effect on generalization error from having two meta parameters I can twiddle? So instead of trying cross-val $N$ times I will try it $N \cross N$? Would two separate out of sample datasets fix this problem (i.e. pick the best of Lasso, the best of Elastic Net, NNG, LAR, etc, and then pick the best of these on the second OOS data)? — user2763361, Oct 30 '13 at 01:37
Usually, you'd fix the parameter "alpha" that governs the proportion of L1 and L2 penalties a-priori (say to 0.5), and then tune lambda. Tuning on a 2d grid can be dangerous, because there's a big risk of over-fitting. — Stefan Wager, Oct 30 '13 at 02:14
@StefanWager, in relation to your warning about overfitting on a 2D grid, would you have an answer to [this](http://stats.stackexchange.com/questions/176566/selection-of-alpha-in-elastic-net-overfitting) question? And what do you think about the answer to [this](http://stats.stackexchange.com/questions/173647/grid-fineness-and-overfitting-using-regularization-lasso-ridge-elastic-net) one? My experience has been that people *do* tune on a grid of $\alpha$ and then for each $\alpha$ on a grid of $\lambda$, so effectively on a 2D grid, and that seems to work for them. What is your take on that? — Richard Hardy, Mar 13 '17 at 11:49
@StefanWager, also [here](http://stats.stackexchange.com/questions/17609/cross-validation-with-two-parameters-elastic-net-case) the first (highest-upvoted) answer suggests cross validating on a 2D grid. — Richard Hardy, Mar 13 '17 at 12:13

When would I choose Lasso over Elastic Net

0 Answers0