0

I am running a Logit-LASSO model predicting binary outcome. In R (glmnet) I am getting only 1's prediction (over the testing sample). When doing the same in Stata (lassopack) predictions vary so much that I get 1's but also 0's (as real data represents).

enter image description here enter image description here

Note that predictions are continuos (as the outcome is treated as a probability, even though it is a binary variable). Afterwards I convert the predicted outcome into 0's and 1's. In the second figure, all probabilities are >0.5, therefore R only predicts 1's.

Are those discrepancies usual in your experience? How can I standardise the process so I can compare predictions of Stata/R?

Thank you.

vog
  • 111
  • 3
  • 1
    The prediction by a logistic regression *is* a probability, not a discrete category. – Dave Jul 14 '21 at 13:36
  • 1
    +1 to Dave. Don't blindly convert probabilistic predictions to categories. [You need to take costs of wrong decisions into account.](https://stats.stackexchange.com/a/312124/1352) Better to stick with probabilistic predictions and evaluate these using [proper scoring rules](https://stats.stackexchange.com/tags/scoring-rules/info). – Stephan Kolassa Jul 14 '21 at 13:37
  • Thank you very much for your comments. The density of probabilities I have shown in the figures are before the transformation to 0's and 1's. Moreover, we can observe huge disparities in predicted probabilities. How can we solve these discrepancies? – vog Jul 14 '21 at 16:47

0 Answers0