0

When we talk about unbalanced data, we usually think about SMOTE, resampling and so on. Usually the methods mentioned here: https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets.

What are others methods you've seem that are not so explored in these popular tutorials we find on the internet?

Dumb ML
  • 197
  • 6
  • 2
    Actually, when I talk about unbalanced data, I link to [Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?](https://stats.stackexchange.com/q/357466/1352) – Stephan Kolassa Aug 04 '20 at 14:53

1 Answers1

2

Boosting is well known in literature to be one of the most effective if not the absolute best method to deal whith imbalanced data; some specific algorithm like RusBoost have also been developed. no particolar preprocessing is needed.

Actually, also neural networks, and probabilistic regression in general, work quite well with unbalanced datasets. Oversampling and data synthesis are resource consuming methods that can easily worsen your results, they are old tools that are now mostly obsolete.

carlo
  • 4,243
  • 1
  • 11
  • 26