Why not use Ridge after Lasso vs relaxed Lasso

Question

Has anyone ever applied a ridge regression on a model subset selected from a cross validated lasso? In other words, take a data set with p features and run lasso, grid searched to find optimal penalty parameters. Then record which features were dropped, and run only those features in a ridge regression. This approach seems similar to the "relaxed lasso" suggested by Meinhausen (2007) and clarified in the CV thread.

The only result on using ridge after lasso in literature I could find is this theoretical paper.

My intuition is that if relaxed lasso's objective is to split the separate variable selection and variable shrinkage, then why not go with ridge regression in the second pass? This will ensure that there isn't any variable selection in the second pass, whereas re-running lasso could result in additional feature drops.

See this thread: https://stats.stackexchange.com/questions/326427 -- some parts of my answer there are relevant. I used ridge after elastic net (ridge+lasso) as an attempt to make something like "relaxed elastic net". — amoeba, Jan 09 '19 at 20:49
In this interesting paper [De Mol 2009](https://arxiv.org/abs/0809.1777) the authors use a combination of elastic-net followed by ridge regression (which they simply refer to as _regularized least squares_) in order to select nested list of genes. The underlying idea is that the second ridge step should cope with the well-known _over_-shrinking effect of naïve elastic-net. — fsamu, Jan 09 '19 at 21:20

Why not use Ridge after Lasso vs relaxed Lasso

0 Answers0

Linked