Python / Keras: SMOTE and validation_split

Question

I try to train a MLP with an imbalanced dataset. I'd like to use SMOTE to balance my classes; as highlighted here (https://beckernick.github.io/oversampling-modeling/), the class rebalancing should always be done after splitting into train / test set, because otherwise information from the test set will "spill" the training set.

In addition to having a test set, however, I would like to use a validation set by means of the validation_split parameter. Is it safe to use this after having applied SMOTE? Or should I rather split the data first into training, validation and test sets and insert the validation set via the validation_data parameter to the fit function?

Validation set should also be independent if present, so apply SMOTE after splitting. — user2974951, Feb 07 '19 at 12:41

score 2 · Answer 1 · answered Feb 07 '19 at 16:00

2

Please take a look at this argument class_weight in fit method.

fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, 
validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, 
sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)

This is another way to handle imbalanced classes. Instead of tackling the problem on sampling side with over-representation of minority class, this parameter assists in taking bigger steps during optimization by giving more weight to loss of minority class resulting in similar outcome.

answered Feb 07 '19 at 16:00

solver149

306
2
6

Indeed, class weighting is a good alternative to SMOTE :) – Requin Feb 07 '19 at 18:24
1

It works and is helpful, but not really the answer to the question above - but thank you anyway :) – Requin Feb 07 '19 at 18:55

Python / Keras: SMOTE and validation_split

1 Answers1