9

I am trying to fit a predictive gene-based model in survival analysis. My question is:

Can I use LASSO as a variable selection method, and then run a multivariate Cox regression to get the coefficients of those variables (genes) instead of using the coefficients from LASSO?

In that case (if I run a multivariate Cox regression), I can get p-values, and make an inference for the coefficients.

I am pretty new to this area. Please bear with me if my questions are a bit dumb.

Jenny
  • 91
  • 1
  • 2
  • good question (+1). i've been told that this approach is not uncommon. my guess is that it's somehow questionable. – miura Aug 06 '12 at 17:15
  • was it a good question? :) thanks!! – Jenny Aug 06 '12 at 19:47
  • yes, actually we're doing exactly that in our current project: variable selection with the LASSO and everything else with an unregularized Cox model. My impression is that this is not really correct, but there is a practical necessity because the LASSO cannot provide confidence intervals. However, I'd expect that the confidence intervals determined by a Cox model for variables selected through the LASSO are biased again, essentially because of multiple testing. I'd really love to read the opinion of some of the silverbacks here on this. – miura Aug 07 '12 at 07:46
  • 1
    I would use LASSO regularised Cox regression and then use bootstrapping to generate the confidence intervals. I would have thought that the two-stage approach would give invalid confidence intervals as the same data has already been used to select the features prior to application of Cox regression. – Dikran Marsupial Aug 07 '12 at 11:10
  • There is R package crrp which gives you confidence intervals. – Hello Aug 05 '19 at 17:09

1 Answers1

6

It would be better to perform a COX regression with an L1 regularisation term, which would give the same type of variable selection you get from the standard least-squares LASSO approach. ISTR there has been at least one paper on this in the journal "Bioinformatics". There is a good paper by Robert Tibshirani, and @miura says that this is implemented in glmnet.

Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
  • 1
    So last week when I used an L1-regularized Cox Model (r packages penalized and glmnet provide this) it wasn't actually a LASSO? – miura Aug 06 '12 at 17:13
  • 1
    LASSO stands for "least absolute shrinkage and selection operator" so it refers to the L1 penalty term, so I'd say it was a LASSO method whatever the loss. I'll edit my answer slightly. – Dikran Marsupial Aug 06 '12 at 17:16
  • Dikran and miura, thanks a lot for your answers. I should definitely read the paper you mentioned. Really appreciate your help! – Jenny Aug 06 '12 at 19:46