performance measure suited for imbalanced classes and robust towards changing class ratios

Question

I am looking for the best performance measure.

My use case: I want to find out which dataset can be modelled best with binary classification. The datasets have an active minority class I am interested in and have different class ratios (its actually the same dataset where the classes are produced with different discretization thresholds):

dataset A has active class ratio 1/10
dataset B has active class ratio 2/10
dataset C has active class ratio 3/10

AUC ROC is not suitable, as it gives over-optimistic results for highly unbalanced datasets. Hence it will probably favor dataset A.

Average Precision (a.k.a Area under Precision Recall Curve) is not suitable as well, as its baseline is the class ratio, therefore, it will favor dataset C.

Any help would be very much appreciated.

Christopher John · Answer 1 · 2019-10-20T07:22:17.607

1

AUC-ROC is insensitive to class distribution and is one of the best metrics for imbalanced data. I don’t think it will prefer A - the diagonal always represents chance, in contrast, PR-AUC is not suitable for comparing between different group sizes because the baseline varies.

See this NIPS paper for examples of the many preferable attributes of ROC-AUC and the problems with PR-AUC. https://papers.nips.cc/paper/5867-precision-recall-gain-curves-pr-analysis-done-right.pdf. They also give their own solution which is the precision-recall gain curve which is nice because the correct some of the issues with PR and keep its special sensitivity to false positives.

edited Oct 20 '19 at 07:22

answered Oct 08 '19 at 17:15

Christopher John

198
1
8

Unfortunatly, it does. "ROC curves can present an overly optimistic view of an algorithm’s performance if there is a large skew in the class Distribution" (https://dl.acm.org/citation.cfm?id=1143874). Here, is a nice blog post: https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python – user954923 Oct 09 '19 at 10:29
I've read that blog post actually, you can see this stackexchange for a discussion on the whole topic https://stats.stackexchange.com/questions/7207/roc-vs-precision-and-recall-curves. Please see the answer by David Powers, a senior academic who has published prolifically in the field of ROC and its properties. It is one of the best metrics for imbalanced data (compare with accuracy for example), PR-AUC has a whole load of problems with it which ROC-AUC does not. For example, PR-AUC is totally unsuitable for comparing models run on different group sizes. – Christopher John Oct 09 '19 at 10:58

performance measure suited for imbalanced classes and robust towards changing class ratios

1 Answers1