1

I got following chart: enter image description here

The algorithms have been applied to a dataset where an outcome is pretty rare, it happens 10% of the times (binary, 0- 90%, 1-10%). It is the response whether a client is going to default or not.

I struggle to believe its values. We see an increase in sensitivity, which makes totally sense to me and the accuracy being constant high (since the outcome is rare) throughout the algorithms.

But shouldn't the AUROC be correlated to the sensitivity, since it takes the hit and false alarm rate? What would we expect for the accuracy ratio (Gini)?

In credit risk, AUROC is one of the most important benchmarks. I try to figure out why. Looking at this table sensitivity would make much more sense to me.

Dave
  • 28,473
  • 4
  • 52
  • 104
  • Sensitivity is an improper scoring rule that does not consider the probability values from the model, just the side of some threshold on which that probability falls. AUROC considers those probability values. – Dave Apr 22 '21 at 11:14
  • @Dave true, it just considers a cut-off value. But shouldn't the goal be to catch most defaulters? Also hit and false alarm rate should be connected to the sensitivity? Does that mean, that the specific cut-off value for the sensitivity has been poorly chosen? – Romero Azzalini Apr 22 '21 at 11:22
  • Maybe the threshold is a poor one. Maybe there shouldn’t be a threshold at all and users should base interest rates on default probabilities. After all, credit scores are on a continuum (more or less), not just good credit/bad credit. Thresholding too early wrecks the ability to do this. Stephan Kolassa and Frank Harrell both have written about this on here; Harrell also has blog posts: https://www.fharrell.com/post/class-damage/ and https://www.fharrell.com/post/classification/. // A standard software default is to threshold at $0.5$. I think that a $40\%$ chance of default is pretty high! – Dave Apr 22 '21 at 11:27
  • Very relevant: [Classification probability threshold](https://stats.stackexchange.com/q/312119/1352). Do not use accuracy to evaluate a classifier: [Why is accuracy not the best measure for assessing classification models?](https://stats.stackexchange.com/q/312780/1352) [Is accuracy an improper scoring rule in a binary classification setting?](https://stats.stackexchange.com/q/359909/1352) The same problems apply to sensitivity and specificity, and indeed to all evaluation metrics that rely on hard classifications. – Stephan Kolassa Apr 22 '21 at 11:36
  • ... The same problems apply to sensitivity and specificity, and indeed to all evaluation metrics that rely on hard classifications. Instead, use probabilistic classifications, and evaluate these using [proper scoring rules](https://stats.stackexchange.com/tags/scoring-rules/info). – Stephan Kolassa Apr 22 '21 at 11:36
  • @Dave thank you, nice answers! Would it be possible to get a missleading AUROC as well? Where in theory the prediction is poor, but the AUROC value is high. – Romero Azzalini Apr 22 '21 at 12:25

0 Answers0