0

I am facing the problem of class imbalance in my dataset. Due to some constraints, I cannot use any other classifier than logistic regression in sklearn.

Given that, Can I impose a penalty for misclassification of the minority class to improve the model?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • Logistic regression does not hard classify data points, it estimates probabilities. You can impose misclassification costs when you choose a theeshold on the probabilities. – Matthew Drury Apr 27 '18 at 00:34
  • Let's say I have a threshold of 0.5. How do I go after this using sklearn? – Rishabh Agarwal jain Apr 27 '18 at 10:06
  • 2
    No, you CHOOSE the threshold based on the misclassification costs. You need to write down a matrix of the various misclassification costs, then draw a curve of the expected cost at various thresholds. You can then choose the optimal threshold. Sklearn provides a `predict` method that arbitrarily thresholds at 0.5, and you should never use it. – Matthew Drury Apr 27 '18 at 14:19
  • See also https://stats.stackexchange.com/questions/127042/why-isnt-logistic-regression-called-logistic-classification – kjetil b halvorsen Dec 01 '18 at 20:55

0 Answers0