Lets say I have a very basic, binary classification problem and I use logistic regression. The logistic regression will give me a score (not a classification yet), between 0 and 1.
I can use sklearn's roc_auc_score
to calculate the ROC easily by using roc_auc_score(y_train, predicted_scores)
. The function will find the best threshold for me.
However, if I want to check the ROC for my validation set, can I just use roc_auc_score(y_val, predicted_val_scores)
? Because then it will look again for the best threshold right? Should I not find a way to use the same threshold as in the first function? Or am I overthinking this?