0

I have a model built using logistic regression with L1 regularization (glmnet package). I built this model using 1% of total data available to me. To ensure that the variance of my result is small, I am now planning to run bootstrap on my available data. I am just not sure how to "aggregate" the final coefficient that I will be using to make "predict"?

Is this a right approach? If not what are some ways to go about this?

Dee
  • 3
  • 2
  • I think it would be better to bootstrap the predictions from each model. At least in that case, you can have the other predictions in order to create bootstrapped confidence intervals. – Demetri Pananos Apr 28 '20 at 12:55
  • I really thought about this and my conclusion is that we cannot do so. This is because Lasso chooses 1 variable among 2 highly correlated variables. But we do not know which variable it chooses. So, when we average out 2 models, we might be averaging between mutually exclusive set of variables. So, it doesn't make any sense to average out the variables. https://stats.stackexchange.com/questions/402267/how-to-obtain-confidence-intervals-for-a-lasso-regression – Dee Apr 28 '20 at 13:09
  • Tell us why you used only 1% of data! If you want model averaging, then ridge will be closer ... – kjetil b halvorsen Apr 28 '20 at 15:19
  • I have around 300 million rows. And because of computation limitation, I ended up choosing around 2-3 million rows – Dee Apr 28 '20 at 15:48

0 Answers0