0

I have a dataset containing information about patients in a hospital, with the following variables:

  • Status for a certain disease (binary outcome)
  • Hundreds of continuous biomarkers
  • A few variables for adjustment (age, gender, etc.)
  • Patient ID

My objective is to find biomarkers that are associated with the disease. I have more biomarkers than observations, so I thought of ridge/LASSO logistic regression. But I also need to take into account that I have samples collected from the same patient, so I need to include Patient ID as a random effect.

The problem if that I am also expected to provide p-values (or maybe posterior probabilities) for each biomarker. The R packages that I tried did not provide this info.

I thought of doing permutation tests, but this would be very time consuming, since I have many biomarkers and the p-values would have to be precise enough for me to use multiple test corrections afterwards.

Answers with R/Python code or references would be appreciated. Thank you in advance!

PedroSebe
  • 2,526
  • 6
  • 14
  • (Unfortunately?) we do not get p-values associated with the regularised coefficient. The reason is the resulting coefficients estimates are (potentially severely) biased in order to reduce their variance. That means that their associated standard errors will be smaller than they should normally. CV.SE has two very relevant questions/discussions [here](https://stats.stackexchange.com/questions/224796) and [here](https://stats.stackexchange.com/questions/2121). – usεr11852 Mar 28 '21 at 03:00

0 Answers0