Why is ROC AUC equivalent to the probability that two randomly-selected samples are correctly ranked?

Question

I found there are two ways to understand what AUC stands for but I couldn't get why these two interpretations are equivalent mathematically.

In the first interpretation, AUC is the area under the ROC curve. Picking points from 0 to 1 as threshold and calculate sensitivity and specificity accordingly. When we plot them against each other, we get ROC curve.

The second one is that the AUC of a classifier is equal to the probability that the classifier will rank a randomly chosen positive example higher than a randomly chosen negative example, i.e. P(score(x+)>score(x−)). (from What does AUC stand for and what is it?)

Artem Sobolev · Accepted Answer · 2019-01-13T09:03:26.290

It's easy to see once you obtained a closed-form formula for AUC.

Since we have finite number of samples $\{(x_i, y_i)\}_{i=1}^N$, we'll have finite number of points on the ROC curve. We do linear interpolation in between.

First, some definitions. Suppose we'd like to evaluate an algorithm $A(x)$ that outputs a probability of $x$ lying in the positive class $+1$. Let's define $N_+$ as the number of samples in the positive class $+1$ and $N_-$ as the number of samples in the negative class $-1$. Now, for a threshold $\tau$ let's define False-Positive-Rate (FPR, aka 1-specificity) and True-Positive-Rate (TPR, aka sensitivity):

$$ \text{TPR}(\tau) = \frac{\sum_{i=1}^N [y_i = +1] [A(x_i) \ge \tau]}{N_+} \quad \text{and} \quad \text{FPR}(\tau) = \frac{\sum_{i=1}^N [y_i = -1] [A(x_i) \ge \tau]}{N_-} $$

(where $[\text{boolean expression}]$ is 1 if expression is true, and 0 otherwise). Then, ROC curve is built from points of the form $(\text{FPR}(\tau), \text{TPR}(\tau))$ for different values of $\tau$. Moreover, it's easy to see that if we order our samples $x_{(i)}$ (note the parentheses) according to the algorithm's output $A(x_i)$, then neither $\text{TPR}$ nor $\text{FPR}$ changes for $\tau$ between consecutive samples $A(x_{(i)}) < \tau < A(x_{(i+1)})$. So it's enough to evaluate FPR and TPR only for $\tau \in \{A(x_{(1)}), \dots, A(x_{(N)})\}$. For $k^{\text{th}}$ point we have

$$ \text{TPR}_k = \frac{\sum_{i=k}^N [y_{(i)} = +1]}{N_+} \quad \text{and} \quad \text{FPR}_k = \frac{\sum_{i=k}^N [y_{(i)} = -1]}{N_-} $$

(Note both sequences are non-increasing in $k$). These sequences define x and y coordinates of points on the ROC curve. Next, we linearly interpolate these points to get the curve itself and calculate area under the curve (Using a formula for area of a trapezoid):

$$ \begin{align*} \text{AUC} &= \sum_{k=1}^{N-1} \frac{\text{TPR}_{k+1} + \text{TPR}_{k}}{2} (\text{FPR}_{k} - \text{FPR}_{k+1}) \\ &= \sum_{k=1}^{N-1} \frac{\sum_{i=k+1}^N [y_{(i)} = +1] + \tfrac{1}{2} [y_{(k)} = +1]}{N_+} \frac{[y_{(k)} = -1]}{N_-} \\ &= \frac{1}{N_+ N_-} \sum_{k=1}^{N-1} \sum_{i=k+1}^N [y_{(i)} = +1] [y_{(k)} = -1] = \frac{1}{N_+ N_-} \sum_{k < i} [y_{(k)} < y_{(i)}] \end{align*} $$

Here I used the fact that $[y = -1] [y = +1] = 0$ for any $y$.

So there you have it: AUC is proportional to the number of correctly ordered pairs, which is proportional to the probability of random pair of samples being ranked according to their labels.

Thanks @Barmaley.exe. This is really helpful. Finally connect the dots. A following up question though, is how AUC is equivalent to Mann Whitney U test. — Felicia.H, Jan 15 '16 at 19:48
I don't have an answer to that, but either way it'd not be a good idea to answer another (mostly unrelated) question here. I suggest you create a new thread. — Artem Sobolev, Jan 15 '16 at 20:09
This is all covered in Hanley and McNeil 1982: https://pubs.rsna.org/doi/10.1148/radiology.143.1.7063747 (Wilcoxon-Mann-Whitney equivalence etc.). — Frank Harrell, Jan 13 '19 at 13:17
Can you please clarify how did the numerator before the last step is rearranged? where did $1/2[y_(k) = +1]$ go? — christopher, May 11 '20 at 20:51
Hi @ArtemSobolev and thank you for your input. I did not yet carefully think about the derivation but one question though: Is the set $\{A(x_{(1)}), \dots, A(x_{(N)})\}$ enough? I mean when $\tau \in A(x_{(1)})$ everything is classified as positive and when $\tau \in A(x_{(N)})$ only the $(x_{(N)}, y_{(N)})$ sample is classified as positive. Should we still try another threshold like $A(x_{(N)})+c$ where $c$ is e.g. a very small number? Wouldn't this threshold be the case where everything is classified as negative? Wouldn't we now cover all the $(TPR(\tau), FPR(\tau))$ cases? — jjepsuomi, Nov 04 '20 at 16:40
To add to the previous, would it be more correct to say instead of: "...it's enough to evaluate FPR and TPR only for $\tau \in \{A(x_{(1)}), \dots, A(x_{(N)})\}$..." that "...it's enough to evaluate FPR and TPR only for $\tau \in \{A(x_{(1)}), \dots, A(x_{(N)})\}$ which cover all the TPR,FPR cases except the one where all points are classified as negative." — jjepsuomi, Nov 04 '20 at 16:59

Why is ROC AUC equivalent to the probability that two randomly-selected samples are correctly ranked?

1 Answers1

Linked

Related