1

I have this code which runs SMOTE and then getting roc_auc_score.

The issue is that every I run the code on the same dataset, I get different results.

How can I fix this? I need the same sample when ruining my code and the same results.

The ROC curve is also changing

y = df.target
X = df.drop('target', axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=27)

sm = SMOTE(random_state=27, sampling_strategy=1.0)
X_train, y_train = sm.fit_sample(X_train, y_train)

smote_nn =MLPClassifier(hidden_layer_sizes=(10, 10, 10), max_iter=1000).fit(X_train, y_train)

smote_pred_nn = smote_nn.predict_proba(X_test)[:,1]

false_positive_rate, true_positive_rate, threshold1 = roc_curve(y_test, smote_pred_nn)
print('roc_auc_score for NN: ', roc_auc_score(y_test, smote_pred_nn))
```
Eliza
  • 11
  • 1
  • This sounds like a pure coding question, not statistics. // Good news! Class imbalance is not a problem! https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ https://stats.stackexchange.com/a/359936/247274 https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email https://twitter.com/f2harrell/status/1062424969366462473?lang=en – Dave Jun 03 '21 at 18:34
  • 1
    You haven't set a seed for the MLP. – Ben Reiniger Jun 03 '21 at 18:43

1 Answers1

1

You need to set a seed, so that the (psueodo) random number generator always starts at the same place when you run the code.

np.random.seed(123)
Robert Long
  • 53,316
  • 10
  • 84
  • 148