Getting different results when running SMOTE

Question

I have this code which runs SMOTE and then getting roc_auc_score.

The issue is that every I run the code on the same dataset, I get different results.

How can I fix this? I need the same sample when ruining my code and the same results.

The ROC curve is also changing

y = df.target
X = df.drop('target', axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=27)

sm = SMOTE(random_state=27, sampling_strategy=1.0)
X_train, y_train = sm.fit_sample(X_train, y_train)

smote_nn =MLPClassifier(hidden_layer_sizes=(10, 10, 10), max_iter=1000).fit(X_train, y_train)

smote_pred_nn = smote_nn.predict_proba(X_test)[:,1]

false_positive_rate, true_positive_rate, threshold1 = roc_curve(y_test, smote_pred_nn)
print('roc_auc_score for NN: ', roc_auc_score(y_test, smote_pred_nn))
```

This sounds like a pure coding question, not statistics. // Good news! Class imbalance is not a problem! https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ https://stats.stackexchange.com/a/359936/247274 https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email https://twitter.com/f2harrell/status/1062424969366462473?lang=en — Dave, Jun 03 '21 at 18:34

score 1 · Answer 1 · answered Jun 04 '21 at 08:29

1

You need to set a seed, so that the (psueodo) random number generator always starts at the same place when you run the code.

np.random.seed(123)

answered Jun 04 '21 at 08:29

Robert Long

53,316
10
84
148

Getting different results when running SMOTE

1 Answers1