Discrepancies using Logit-LASSO in R/Stata

Question

I am running a Logit-LASSO model predicting binary outcome. In R (glmnet) I am getting only 1's prediction (over the testing sample). When doing the same in Stata (lassopack) predictions vary so much that I get 1's but also 0's (as real data represents).

Note that predictions are continuos (as the outcome is treated as a probability, even though it is a binary variable). Afterwards I convert the predicted outcome into 0's and 1's. In the second figure, all probabilities are >0.5, therefore R only predicts 1's.

Are those discrepancies usual in your experience? How can I standardise the process so I can compare predictions of Stata/R?

Thank you.

The prediction by a logistic regression *is* a probability, not a discrete category. — Dave, Jul 14 '21 at 13:36
+1 to Dave. Don't blindly convert probabilistic predictions to categories. [You need to take costs of wrong decisions into account.](https://stats.stackexchange.com/a/312124/1352) Better to stick with probabilistic predictions and evaluate these using [proper scoring rules](https://stats.stackexchange.com/tags/scoring-rules/info). — Stephan Kolassa, Jul 14 '21 at 13:37
Thank you very much for your comments. The density of probabilities I have shown in the figures are before the transformation to 0's and 1's. Moreover, we can observe huge disparities in predicted probabilities. How can we solve these discrepancies? — vog, Jul 14 '21 at 16:47

Discrepancies using Logit-LASSO in R/Stata

0 Answers0