0

I've been looking into oversampling of the minority class in classification problems. So essentially given a classification problem you generate synthetic samples from the minority class to balance out the dataset. So for example, if you have tabular data describing cats and dogs and you have 20 dog rows and 4 cat rows you synthetically generate more cat rows to balance out the train set. I was just curious how do you determine how many rows to generate? Are you trying to make them completely equal? As in 20 dog and 20 cats after you oversample? What do you do if the minority class is EXTREMELY small for example: 20 dog and 2 cat? Is there an algorithm good for determining this number?

user8714896
  • 640
  • 2
  • 12
  • [Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?](https://stats.stackexchange.com/q/357466/1352) – Stephan Kolassa Nov 08 '21 at 07:01

0 Answers0