0

My datasets has severe class imbalance with lots of zeros in the columns. Here is the total count of my samples.

Total Samples: 12237697

Positive samples: 1061 (0.01% of total)

I have tried weighted tree based methods, cost sensitive techniques, threshold moving technique, SMOTE, random under sampling, and oversampling methods as well. Still it didn't help at all. Here is sample output of a weighted logistic regression. How can I improve my model performance? Any thoughts will be appreciated. Thanks!

enter image description here

ForestGump
  • 55
  • 5
  • 1
    Good news! Class imbalance is not a problem! https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ https://stats.stackexchange.com/a/359936/247274 https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email https://twitter.com/f2harrell/status/1062424969366462473?lang=en – Dave Mar 27 '21 at 03:20

0 Answers0