I have an unbalanced dataset and would like to apply SMOTE to the training data. I can either do one of the following:
- Inside
trainControl()
addsampling = "smote"
and then runtrain()
- First sample the training data using
SMOTE()
, NOT includesampling
intrainControl()
and then runtrain()
. ForSMOTE()
I used the default parameters as in the documentation:SMOTE(form, data, perc.over = 200, k = 5, perc.under = 200, learner = NULL, ...)
However, I end up with different results because the training dataset are different sizes, the first option maintains the number of observations, but the second option reduces the number of observations.
I would like to know why and which one would be the correct way of doing it. Thanks.