5

I try to train a MLP with an imbalanced dataset. I'd like to use SMOTE to balance my classes; as highlighted here (https://beckernick.github.io/oversampling-modeling/), the class rebalancing should always be done after splitting into train / test set, because otherwise information from the test set will "spill" the training set.

In addition to having a test set, however, I would like to use a validation set by means of the validation_split parameter. Is it safe to use this after having applied SMOTE? Or should I rather split the data first into training, validation and test sets and insert the validation set via the validation_data parameter to the fit function?

Requin
  • 505
  • 4
  • 15

1 Answers1

2

Please take a look at this argument class_weight in fit method.

fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, 
validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, 
sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)

This is another way to handle imbalanced classes. Instead of tackling the problem on sampling side with over-representation of minority class, this parameter assists in taking bigger steps during optimization by giving more weight to loss of minority class resulting in similar outcome.

solver149
  • 306
  • 2
  • 6