1

My understanding of oversampling/undersampling is merely manipulate the data to increase the proportion of rare group in your target. Isn’t this manipulating data? How could you expect a reliable outcome oversampling/undersampling? For example, you fit logistic regression on oversampling/under sampling data and you use that to give score/probability to original data. Your score will be changed for given variables compare to the logistic regression model developed using original data to score the data. So you need to adjust your score from model developed on oversampling/under sampling data. But why do you require to adjust your probability score when the purpose of oversampling/undersampling is to increase probability of rare class. Isn’t adjust probability/score reverting everything back to original status? Can someone explain in simple terms?

gyambqt
  • 61
  • 1
  • 5
  • 3
    Good news! Class imbalance is not a problem! https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ https://stats.stackexchange.com/a/359936/247274 https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email https://twitter.com/f2harrell/status/1062424969366462473?lang=en – Dave Jul 07 '21 at 01:10
  • It is perhaps worthwhile to ask when imbalanced data is a problem, and whether that problem is one we are required to solve. https://stats.stackexchange.com/questions/283170/when-is-unbalanced-data-really-a-problem-in-machine-learning – Sycorax Jul 07 '21 at 01:54
  • It still did not really explains…….. – gyambqt Jul 07 '21 at 08:30
  • 2
    Can you elaborate? I think the linked threads are very clear. What part of these answers is unclear, specifically? – Sycorax Jul 07 '21 at 16:18
  • Could you refer to this: https://stats.stackexchange.com/questions/533678/oversampling-undersampling-issue – gyambqt Jul 08 '21 at 11:16

0 Answers0