9

Since Lasso selects the optimal predictors to include in the model, does this suggest that we don't need to do any of the typical significance testing that comes with OLS regression and logistic regression? I am pretty used to the R output with stars by each regressor, but from talking to people, it seems like in practice, they just optimize lambda in Lasso and then just use those coefficients - and make the implicit assumption that all are significant.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
confused
  • 2,453
  • 6
  • 26
  • 3
    Lasso is model selection, not significance testing. Do you have a specific case where you find a doubtful application of lasso? – abstrusiosity Nov 06 '20 at 20:10
  • 1
    LASSO doesn't show stars because the hyper-parameter is data-optimized, so any hypothesis test would be affected by data snooping, and error control would be lost. If the hyper parameter was known a priori, then intervals and tests would be available by a bootstrap. This is a situation where Bayesian statistics is very useful. – JTH Nov 06 '20 at 20:29
  • 4
    LASSO and significance tests address VERY different sets of problems. LASSO is for optimizing the bias/variance tradeoff in a predictive model, significance tests are for testing hypothesis in an explanatory model. Said conversely, significance tests are *not* for feature selection, and LASSO is *not* for hypothesis testing. You're not making the assumption that the coefficients are all significant in a LASSO regression, you just don't care. – Matthew Drury Nov 07 '20 at 00:51
  • 2
    There is also a misconception here that the Lasso has a high probability to select the relevant predictors. Frank Harrell shows that this isn't the case at all: https://www.fharrell.com/talk/stratos19/ (see also [this paper](https://jmlr.org/papers/volume7/zhao06a/)). – COOLSerdash Nov 07 '20 at 10:38
  • It would be extremely helpful if you posted what kind of problem you hope to solve with LASSO regression. If, as the hypothesis testing tag suggests, you want to do something like an ANCOVA, then LASSO might not be the method for you. – Dave Nov 07 '20 at 15:11

1 Answers1

9

Summarizing information provided in comments:

Lasso selects the optimal predictors to include in the model...

No. LASSO selects a set of predictors that happens to work on a particular data set. There is no assurance that they are "optimal" in any broad sense. This is particularly the case when predictors associated with outcome are correlated. See this page and the pages there noted as "Linked" and "Related" for details. Try repeating LASSO on multiple bootstrapped samples of a data set, and see how frequently the same predictors are retained in the models.

... we don't need to do any of the typical significance testing that comes with OLS regression and logistic regression

First, if you are mainly interested in prediction, then there is limited need to do significance testing. Given the risks of omitted-variable bias, there is little to be gained my omitting any predictors that might reasonably be associated with outcome unless you are at risk of overfitting the model. Just because you can't "prove" at p < 0.05 that some predictor is associated with outcome, that doesn't mean that it can't help improve predictions.

Second, with proper care and understanding of what the p-values mean, inference is possible with LASSO. See this page for an introduction to the issues and further links.

EdM
  • 57,766
  • 7
  • 66
  • 187