-1

I'm working on Imbalance Classification problem with minority class(0.017%).

I've read that imbalance classification can be handled using Undersampling, Oversampling and SMOTE.

Major drawback of Undersampling is we ignore lot of information and the results with SMOTE are slightly better.

But my questions are

  1. SMOTE also does undersampling of majority class and how the results produced by SMOTE are better?

    SMOTE(form, data, perc.over = 200, k = 5, perc.under = 200, learner = NULL, ...)

    perc.under
    A number that drives the decision of how many extra cases from the majority classes are selected for each case generated from the minority class (known as under-sampling

  2. What sort of classification problems where Undersampling/Oversampling produces better results than SMOTE?

  • Possibly relevant: https://stats.stackexchange.com/questions/285231/what-problem-does-oversampling-undersampling-and-smote-solve – Matthew Drury Aug 01 '17 at 17:58

1 Answers1

2

To the first question: SMOTE does under sampling of majority class but it creates synthetic samples of minority class which avoid overfitting unlike oversampling and that is why results of SMOTE are generally better than oversampling and under sampling ( Under sampling leads to underfit ). You can learn more about SMOTE at the following link: https://www.jair.org/media/953/live-953-2037-jair.pdf I don't think we can have better results with oversampling or under sampling in comparison to SMOTE but yes you can give a try to ADASYN, if you are looking for technique which can give better result than SMOTE. You can check the following link https://github.com/stavskal/ADASYN

Harshit Mehta
  • 1,133
  • 12
  • 15