2

I'm probably missing something very obvious, but I don't seem to understand in what cases does the AUC (area under the curve) actually matter.

From what I gather, AUC is basically using a range of possible thresholds and then averaging them, differently from for example F1 that always uses a threshold of 0.5.

I get that using 0.5 might not be optimal, but why don't we just choose the best possible threshold and use that for the performance, instead of averaging it with "bad thresholds" in AUC? Why should we care about the thresholds that don't work well and weigh them in?

Mr. Phil
  • 153
  • 4

1 Answers1

3

You can see these other answers for the rational for using AUC.

The way you talk about "best possible threshold" and "bad threshold" is strange. The correct threshold to use is application dependent. If you don't know ahead of time what the acceptable false positive rate for your application is then you can't say anything about what threshold is good or bad.

There is something called the equal error rate, which is the threshold that gives the best possible accuracy. However, in many cases you care about performance in a different region of the AUC curve than the point that gives the best accuracy. For example, if you are building a fingerprint sensor to control access to a secure building you would set the threshold so that the false positive rate is really low. (It's okay to sometimes deny entry to good people in order to be sure to keep out the bad people.) One way of thinking about AUC is that it is averaging over the different thresholds since you don't know yet which one you want to pick.

Aaron
  • 3,025
  • 14
  • 24
  • +1 (maybe you want to format you hyperlinks in such way that it is clear that two distinct threads are linked) – usεr11852 Jun 19 '17 at 00:20
  • @Aaron Thank you for your answer. I guess it was strange the way I talked because I assumed the thresholds had to do with the output of a prediction; i.e., I thought an algorithm would output a value from 0 to 1, and that we would consider for example that any value above 0.5 should be considered class A and below that value class B. In that sense, we would want the threshold that better separated the outputs, and not some average. But I realize now that all this rationale is incorrect, since the threshold has to do with false positive rates? Or am I still not getting the point? – Mr. Phil Jun 19 '17 at 14:18
  • Looks like you got it! – Aaron Jun 20 '17 at 04:22