Should I use predict_proba or predict when computing metrics

Question

I need to compute some metrics for binary classification. I see that many times some people use the probability:

y_pred_proba = clf.fit(X_train, y_train).predict_proba(X_test)
roc_auc_score(y_test, y_pred_proba[:,1]) # probability of Class 1

and other times:

y_pred = clf.fit(X_train, y_train).predict(X_test)
roc_auc_score(y_test, y_pred) # binary outcome y_pred

if I try both I get completely different results.

Can anyone explain me which one has to be used with metrics score, if predict or predict_proba?

AUROC requires probabilities of the predictions, not classes. Your second approach is wrong. — user2974951, Jan 25 '21 at 08:30
great! many thanks, is this true for all the metrics? f1_score, recall_score etc.? — Luigi87, Jan 25 '21 at 08:31
No. F1, recall and similar require classes. Obligatory reference, because someone will link it eventually, [Why is accuracy not the best measure for assessing classification models?](https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models/312787#312787). — user2974951, Jan 25 '21 at 08:34
ok great! if you are willing to answer I will vote up your answer! — Luigi87, Jan 25 '21 at 08:35

score 2 · Accepted Answer · answered Jan 25 '21 at 08:43

2

AUROC is a semi-proper scoring rules and actually uses the raw probabilities to calculate the best threshold to differentiate the two classes, that is in comparison to a default call to predict, which uses the "non-informative" threshold of 0.5.

Other measures such as accuracy, F1, recall, and others are not proper scoring rules, and they work on classes, so they do not bother with the actual probabilities but require you to classify the observations beforehand.

See the linked thread for some more details about why these are not the best metrics.

answered Jan 25 '21 at 08:43

user2974951

5,700
2
14
27

just one more comment please. If i get it right, roc_auc score must always be preferred to f1_score, recall score, prcision_score, because the latter are based on class, while roc_auc on probs. This is true even if I have an imbalanced dataset in which I want to minimise False Negative (for this I should use recall_score), is this statement right? – Luigi87 Jan 25 '21 at 09:34
1

@Luigi87 AUROC is a better alternative than F1, recall, precision, etc., as you said, because it incorporates more information from the probabilities, rather than just discrete classes. If I had to choose then I would rather choose AUROC, however, as the thread mentions, there are even better alternatives, such as Brier score. – user2974951 Jan 25 '21 at 10:00

Should I use predict_proba or predict when computing metrics

1 Answers1