Why AUC =1 even classifier has misclassified half of the samples?

Question

I am using a classifier which returns probabilities. To calculate AUC, I am using pROC R-package. The output probabilities from classifier are:

probs=c(0.9865780,
0.9996340,
0.9516880,
0.9337157,
0.9778576,
0.8140116,
0.8971550,
0.8967585,
0.6322902,
0.7497237)

probs shows probability of being in class '1'. As shown, the classifier has classified all of samples in class '1'.

True label vector is:

truel=c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0)

As shown, classifier has misclassified 5 samples. But, AUC is:

pROC::auc(truel, probs)
Area under the curve: 1

Could you please explain to me why it happens?

Related question: http://stats.stackexchange.com/questions/97395 — Juho Kokkala, Mar 10 '16 at 11:39

score 24 · Answer 1 · answered Mar 10 '16 at 01:39

24

The other answers explain what is happening but I thought a picture might be nice.

You can see that the classes are perfectly separated, so the AUC is 1, but thresholding at 1/2 will produce a misclassification rate of 50%.

answered Mar 10 '16 at 01:39

jld

18,405
2
52
65

score 22 · Accepted Answer · answered Mar 09 '16 at 18:51

The AUC is a measure of the ability to rank examples according to the probability of class membership. Thus if all of the probabilities are above 0.5 you can still have an AUC of one if all of the positive patterns have higher probabilities than all of the negative patterns. In this case there will be a decision threshold that is higher than 0.5, which would give an error rate of zero. Note that because the AUC only measures the ranking of the probabilities, it doesn't tell you if the probabilities are well calibrated (e.g. there is no systematic bias), if calibration of the probabilities is important then look at the cross-entropy metric.

Sycorax · Answer 3 · 2016-03-15T19:14:20.390

22

The samples weren't "misclassified" at all. The 0 examples are ranked strictly lower than the 1 examples. AUROC is doing exactly what it's defined to do, which is measure the probability that a randomly-selected 1 is ranked higher than a randomly-selected 0. In this sample, this is always true, so it's a probability 1 event.

Tom Fawcett has a great expository article about ROC curves. I'd suggest starting there.

Tom Fawcett. "An Introduction to ROC Analysis." Pattern Recognition Letters. 2005.

edited Mar 15 '16 at 19:14

answered Mar 09 '16 at 18:55

Sycorax

76,417
20
189
313

3

+1 The Fawcett paper is indeed a very good place to start. – Dikran Marsupial Mar 09 '16 at 19:06

Why AUC =1 even classifier has misclassified half of the samples?

3 Answers3

Linked

Related