0

I am trying to classify an indicator of health as 0 and 1. I have an imbalanced dataset (0 : 5700, 1:1700) where all the values are binary (0 and 1 only for all features and target). I applied many resampling algorithms (random resampling, combination of SMOTEN (since only categorical values here: binary features only; total 200 features) and random under sample, resampling by genetic program etc). But couldn't get any improvement (while looking performance by analysing ROC curve, model without resampling performs slightly better than model with resampling).

If all the values are binary (predictors and target) in imbalanced classification problem, could you please suggest a good resampling technique other than random resampling, GP and SMOTEN?

DOT
  • 35
  • 5
  • 1
    Good news! Class imbalance is not a problem! https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ https://stats.stackexchange.com/a/359936/247274 https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email https://twitter.com/f2harrell/status/1062424969366462473?lang=en – Dave Mar 21 '21 at 23:36
  • Okay! But in https://stats.stackexchange.com/questions/283170/when-is-unbalanced-data-really-a-problem-in-machine-learning , the second most voted answer suggests "You are not interested in accurate global prediction, but only in a rare case. In this case you can inflate the data of that case by bootstrapping the data or if you have enough data throwing a way data of the other cases. Notice that this does bias your data and results and so chances and that kind of results are wrong!"... I have to go with this 3rd case... So can I get some suggestion on resampling technique for my problem?... – DOT Mar 22 '21 at 17:06
  • I just saw the comment of Harrell saying that "Log likelihood and related methods; Brier score. But don't predict membership. Estimate tendencies" - For the question on predicting small class membership.. So I am not going with balancing the classes... Thanks! – DOT Mar 22 '21 at 17:34

0 Answers0