I would like to have my trained model tested on an imbalanced dataset. Is there any algorithms available to generate synthetic data from a balanced labelled dataset (spam/non-spam)?
Asked
Active
Viewed 521 times
12
-
You can always unbalance any data set by simply undersampling one class. – user2974951 Sep 19 '18 at 10:54
1 Answers
8
Try SMOTE, its an algorithm used for over-sampling. It creates synthetic samples from the class you want over-sampled.
You can use this to create any number of samples you need.

Mary93
- 403
- 2
- 9
-
1
-
Well, you can obtain undersampling of class A by oversampling class notA ... – kjetil b halvorsen Sep 19 '18 at 14:30
-
3@StuartPeterson No, SMOTE is an over-sampling algorithm, but there are [many other](https://imbalanced-learn.readthedocs.io/en/latest/api.html#module-imblearn.under_sampling) under-sampling algorithms – Mary93 Sep 24 '18 at 18:53