2

My dataset has many biomarkers and the boxplots of these variables show the presence of many outliers. However, these 'outliers' are real data and not misread observations. I want to use elastic net logistic regression to see the association of these biomarkers with a binary outcome. Because my sample size is small and some biomarkers are correlated, I need some regularization. However, I'm not sure if the presence of outliers is a problem for elastic net logistic regressions (I assume it's going to affect the fit adversely because the estimation is still likelihood based). I could only find an R package called enetLTS that does a robust version of the elastic net regression.

I'd like to know if it's better to use this technique or there are other methods that I didn't find. Is there a package that can help identify the outliers and influential observations appropriately so that I can remove those and rerun the regression (if that is suggested at all)? I have found a link that shows sometimes fitting a GLM with the selected variables fails to identify the outliers.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Blain Waan
  • 3,345
  • 1
  • 30
  • 35
  • See [leverage-values-for-ridge-regression](https://stats.stackexchange.com/questions/315780/what-are-the-leverage-values-for-ridge-regression/316306#316306) and https://www.osti.gov/pages/servlets/purl/1337112, https://www.sciencedirect.com/science/article/abs/pii/S1226319214000751 – kjetil b halvorsen May 14 '19 at 09:05
  • Thanks for the links. Do you know about any software implementation (e.g. available R packages)? – Blain Waan May 14 '19 at 19:51
  • Sorry, but I don't know about implemantations – kjetil b halvorsen May 15 '19 at 18:19

0 Answers0