What evaluation metric to use for high class imbalance where i want to capture most of the positive (ones) in the dataset

Question

I have a dataset that contains 99.95% 0's and 0.05% 1's as the target. The dataset contains million rows. I want to build a binary classification model that predicts almost all the 1's correctly while keeping the false positives at minimum.

I have read it somewhere that AUC-PRC is a better metric for the above scenario compared to AUC-ROC. Is it correct?

score 3 · Answer 1 · answered Feb 21 '17 at 03:04

3

Neither seems appropriate. Rather, assign whatever penalty scores you want to the two kinds of errors (mistaking a 0 for a 1, and mistaking a 1 for a 0) and sum the errors. This allows you to precisely control the tradeoff.

answered Feb 21 '17 at 03:04

Kodiologist

19,063
2
36
68

Thank you for your answer. I am new to machine learning and don't know how to implement your answer in a code in R. Could you please elaborate a little? How can i assign penalty score in a model? I plan to use some boosting method to model the data and later modify things based on the results i get. – Aman Feb 21 '17 at 03:24
I believe Kodo is telling about weighed accuracy – SmallChess Feb 21 '17 at 05:27
@Aman That sounds like a programming question, in which case Stack Overflow, not here, is the right site to ask. – Kodiologist Feb 21 '17 at 15:25
1

I think you are over-simplifying things. While imperfect (a "perfect metric" is very subjective) AUCPR is far from totally inappropriate. See some references on the subject. (eg. see @Marc's answer [here](http://stats.stackexchange.com/questions/90779) and papers like [here](http://machinelearning.wustl.edu/mlpapers/paper_files/icml2006_DavisG06.pdf) and [here](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0118432). It is better for people to use AUCPR as they are unlikely to misspecify it than have them put arbitrary weights and penalties ending up with a "salad metric". – usεr11852 Feb 25 '17 at 13:26
@usεr11852 Notice that the OP wants to build a classifier, whereas the two AUC measures attempt to summarize the performance of a signal over a number of classifiers that could be defined with it. Only the classifier actually used matters. – Kodiologist Feb 25 '17 at 16:40
@Kodiologist:The OP describes an extremely common scenario ("*build a binary classification model that predicts almost all the 1's correctly while keeping the false positives at minimum*"), I cannot see how this makes the question any different than dozens of other imbalanced classification tasks. A PR curve will let the OP make a much more informed decision about the threshold to use than a simply defining a (potentially arbitrary) cost. Your suggestion would just move the question of picking a threshold to picking relative weights but without the direct diagnostics provided by a PR curve. – usεr11852 Feb 25 '17 at 16:48
@usεr11852 I'm saying that AUC isn't useful here, not that the entire precision–recall graph isn't useful. – Kodiologist Feb 25 '17 at 18:57
@Kodiologist: That much is obvious. :) I am saying though that this assertion is over-simplifying here. We are presented with a *standard* imbalanced learning problem, you advocate a very particular approach (cost-sensitive learning) and dismiss another one (AUCPR) with no obvious reasons. – usεr11852 Feb 25 '17 at 20:00
@usεr11852 Because of what I just said, that the AUC summarizes performance over all thresholds rather than the performance of a particular classifier, which is what's of interest. – Kodiologist Feb 25 '17 at 23:28
@Kodiologist: But PR curves are of interest! I agree that what you propose has some merit (after all I have not downvoted your answer exactly for this reason). On the other hand plotting the PR curves and using the AUCPR should not be dismissed. You are effectively suggesting using an 0.50 threshold and hope that the weighting will be adequate. I think this is an over-simplifying idea about the use of PR curves as well as the difficulty of finding a weighting scheme. – usεr11852 Feb 25 '17 at 23:51

score 2 · Answer 2 · answered Feb 21 '17 at 03:23

2

You can look at the Precision,Recall and the F1 score which is nothing but the harmonic mean of the Precision and Recall.

answered Feb 21 '17 at 03:23

Santanu_Pattanayak

316
2
8

score 2 · Answer 3 · edited Apr 13 '17 at 12:44

2

Your reading is correct in the sense that AUC-PRC is a better metric for imbalanced classification compared to AUC-ROC. I disagree with Kodi in sense that AUC could be useful in these scenarios. Like Santanu said you could look for precision, recall and F1. I would want to add Sensitivity and Kappa.

However, choice of a metric is not only the way to handle imbalanced classification. You could look for sampling techniques such as SMOTE, converting it to a probability estimation problem with biased threshold and others discussed here and elsewhere.

edited Apr 13 '17 at 12:44

Community

1

answered Feb 21 '17 at 23:02

discipulus

726
4
14

1

Why is AUC-PRC better than AUC-ROC for imbalanced classes. I could not find a good explaination – Aman Feb 22 '17 at 04:56
1

Have you had a look at this: http://stats.stackexchange.com/a/90783/73547 ? – discipulus Feb 22 '17 at 05:03

What evaluation metric to use for high class imbalance where i want to capture most of the positive (ones) in the dataset

3 Answers3