1

I am working on a fraud detection algorithm using a banking dataset which has large number of transactions. The number of true fraud cases are very small (<1%). So accuracy is not a good measure as if we say there is no fraud at all, we will still have over 99% accuracy. I learnt that AUC can be a good measure in such cases but I don't understand why. Can someone explain why?

amalik2205
  • 125
  • 3
  • you can start here https://medium.com/usf-msds/choosing-the-right-metric-for-evaluating-machine-learning-models-part-2-86d5649a5428 – Iman Jul 01 '19 at 21:25
  • Check the following links. They have defined both concepts well. [area-under-curve-of-roc-vs-overall-accuracy](https://stats.stackexchange.com/questions/68893/area-under-curve-of-roc-vs-overall-accuracy); [Why-is-AUC-a-better-measure-of-an-algorithms-performance-than-accuracy](https://www.quora.com/Why-is-AUC-a-better-measure-of-an-algorithms-performance-than-accuracy); & [advantages-of-auc-vs-standard-accuracy](https://datascience.stackexchange.com/questions/806/advantages-of-auc-vs-standard-accuracy). –  Jul 01 '19 at 21:22

2 Answers2

6

Neither "classification" accuracy nor the $c$-index (concordance probability; AUROC) are proper accuracy scoring rules. "Accuracy" should be avoided at all costs, but the concordance probability is still a useful measure of pure predictive discrimination (separation of fraud and non-fraud on the basis of predicted probability of fraud). Concordance is the probability that of two chosen observations, one fraud and one non-fraud, the fraud is the one with a higher predicted probability. You can see how this will work fine even with extreme imbalance.

There are other measures to use that are more sensitive and statistically efficient. See for example http://fharrell.com/post/addvalue .

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
1

Accuracy is a legitimate validation metric when you are working with a balanced dataset. However, it is often the case, in classification problems, that there is a clearly majority-class. Also, errors are rarelly symetrical (for instance, in medicine, false positives and false negatives are not the same) For what I see in your question, you are already familiar with this concept.

On the other hand, AUC (I'd rather see the entire ROC curve, though) gives you an idea of how the true positive/true negative trade-off works. With this I mean that models with high AUC can detect a large amount of true positives without losing its ability to detect true negatives and vice-versa.

You may also be interested in the precision-recall curve

David
  • 2,422
  • 1
  • 4
  • 15
  • 2
    "Accuracy is a legitimate validation metric when you are working with a balanced dataset." The problem is, that there is another, completely different issue: Accuracy doesn't take costs into account. See many articles of Frank Harrell, or my answer here: https://stats.stackexchange.com/questions/368949/example-when-using-accuracy-as-an-outcome-measure-will-lead-to-a-wrong-conclusio/415422#415422 . – Tamas Ferenci Jul 02 '19 at 11:21
  • @TamasFerenci That's literally what is being stated on the very next sentence. "Also, errors are rarelly symetrical (for instance, in medicine, false positives and false negatives are not the same)" – David Jul 02 '19 at 11:44
  • You're right, sorry, I didn't get "asymmetrical" at first glance. Nevertheless, I suggest not calling accuracy a "legitimate metric" in any case. Sorry again! – Tamas Ferenci Jul 02 '19 at 11:46
  • @TamasFerenci Why not? If men and women write me messages at a similar rate and I want to predict whether the last anonymous message I received comes from a male or female, I cannot think of a single sitaution where getting the right answer 97% of the time is not enough of a reason to feel happy about my work! – David Jul 02 '19 at 11:56
  • 3
    I can! If you really-really-really don't want to ever think of a sender of being female when he is indeed male (but you have much less problem with predicting a female sender as male), then you might very well prefer a model that never does the former error, and does the latter in 10% of the cases (accuracy: 90%), then one which never commits the latter, but commits the former in 3% of the cases (accuracy: 97%). – Tamas Ferenci Jul 02 '19 at 12:21
  • @TamasFerenci But that will hardly ever happen in the case I presented, where the errors will pretty much all the time be symmetric. There are also disadvantages in manually setting costs into your cost function – David Jul 02 '19 at 13:10
  • You're still better off with the (correct) cost-based approach, as it work in that case *as well* (just set the costs equal, no problem here!), in contrast to accuracy which works only under special circumstances. – Tamas Ferenci Jul 02 '19 at 13:32
  • @TamasFerenci I disagree, since slight changes in "cost" will force you to retrain the entire model all over again. It is also not always possible to formalize a cost function in mathematical terms – David Jul 02 '19 at 13:40
  • That's indeed a problem, but using accuracy has nothing to do with this, as accuracy also *implies* a cost structure. I mean, accuracy *is* a cost-based decision, with *one particular* cost function, implicitly defined. If it is not even possible to formalize the cost function in your problem, then what makes you think that this particular one will be correct...? – Tamas Ferenci Jul 02 '19 at 13:46
  • @TamasFerenci What cost function should I use then when I don't what the cost function is? It is true that accuracy can be though of as a particular type of cost function, but it has a meaning beyond that (overall proportion of times you screw up) Cost functions present additional problems: for example, if unless we want to give a finite value in cash to human life, every diagnose test should return 100% positives if we attend to the cost function – David Jul 02 '19 at 13:49
  • If you don't know something, the best is to say that you don't know. In this case: if you have no information on the costs, you can run a sensitivity analysis, presenting results for different cost structures. It is very important to note that using accuracy essentially means that you say you *know* the cost structure, while your very starting point was just that you *don't know* it! In other words, the usage of accuracy will imply that you consider something to be known perfectly, while your starting point was just the total opposite, that it is not even possible to formalize in your problem. – Tamas Ferenci Jul 02 '19 at 13:56
  • "every diagnose test should return 100% positives if we attend to the cost function" No, that's absolutely not true. False positivity also has a cost attached (possibly more invasive or risky further diagnostic or treatment procedures, psychological stress of being diagnosed with an illness, monetary costs of further diagnostics etc.). – Tamas Ferenci Jul 02 '19 at 13:58
  • @TamasFerenci No. Presenting accuracy as a validation measure does not mean you know the cost function, because the fact that you use accuracy does not mean that you use ONLY accuracy. Different types of measures such as precision, recall, F1-score, accuracy and many others can be used simultaneously for model validation. Even when the cost is known, you should not leave the entire model-building/validating process to how it performs on one single metric – David Jul 02 '19 at 13:58
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/95625/discussion-between-david-and-tamas-ferenci). – David Jul 02 '19 at 13:59