12

I would like to have my trained model tested on an imbalanced dataset. Is there any algorithms available to generate synthetic data from a balanced labelled dataset (spam/non-spam)?

Stuart Peterson
  • 361
  • 1
  • 6

1 Answers1

8

Try SMOTE, its an algorithm used for over-sampling. It creates synthetic samples from the class you want over-sampled.

You can use this to create any number of samples you need.

Mary93
  • 403
  • 2
  • 9