7

What are the scenarios where Lasso is likely to perform better than Elastic Net (out of sample prediction)?

user2763361
  • 671
  • 6
  • 20
  • The lasso is a special case of the elastic net – Stefan Wager Oct 29 '13 at 07:11
  • @StefanWager I'm aware of this. – user2763361 Oct 29 '13 at 07:14
  • 5
    Maybe one approach is to ask the opposite question: When does the elastic net beat the lasso by a lot? In general, if there is a group of highly correlated features, then the lasso will just pick one of the features and ignore the others. The elastic net, conversely, will tend to use all these features a little bit; this can be desirable from a generalization point of view. On the other hand, if your design matrix X is close to orthogonal, then there may be less reason to use an elastic net over the lasso. – Stefan Wager Oct 29 '13 at 07:26
  • 1
    @StefanWager Agreed. However my question is specified as it is intentionally -- all sources I read cover the relative benefits of Elastic Net but never the inverse. – user2763361 Oct 29 '13 at 07:30
  • @StefanWager Under what conditions will Elastic Net tend to push coefficients to 0.00000000 (like Lasso) versus the Ridge-like solution of 0.0019293? – user2763361 Oct 29 '13 at 07:36
  • The elastic net tends to give sparse solutions, just like the lasso. As you push the "alpha" parameter in glmnet towards 1, the elastic net becomes indistinguishable from the lasso. – Stefan Wager Oct 29 '13 at 19:22
  • @StefanWager What is the effect on generalization error from having two meta parameters I can twiddle? So instead of trying cross-val $N$ times I will try it $N \cross N$? Would two separate out of sample datasets fix this problem (i.e. pick the best of Lasso, the best of Elastic Net, NNG, LAR, etc, and then pick the best of these on the second OOS data)? – user2763361 Oct 30 '13 at 01:37
  • Usually, you'd fix the parameter "alpha" that governs the proportion of L1 and L2 penalties a-priori (say to 0.5), and then tune lambda. Tuning on a 2d grid can be dangerous, because there's a big risk of over-fitting. – Stefan Wager Oct 30 '13 at 02:14
  • @StefanWager Agreed. Maybe I will try 3 or 4 alpha choices. – user2763361 Oct 30 '13 at 02:19
  • @StefanWager, in relation to your warning about overfitting on a 2D grid, would you have an answer to [this](http://stats.stackexchange.com/questions/176566/selection-of-alpha-in-elastic-net-overfitting) question? And what do you think about the answer to [this](http://stats.stackexchange.com/questions/173647/grid-fineness-and-overfitting-using-regularization-lasso-ridge-elastic-net) one? My experience has been that people *do* tune on a grid of $\alpha$ and then for each $\alpha$ on a grid of $\lambda$, so effectively on a 2D grid, and that seems to work for them. What is your take on that? – Richard Hardy Mar 13 '17 at 11:49
  • @StefanWager, also [here](http://stats.stackexchange.com/questions/17609/cross-validation-with-two-parameters-elastic-net-case) the first (highest-upvoted) answer suggests cross validating on a 2D grid. – Richard Hardy Mar 13 '17 at 12:13

0 Answers0