It's easy to see once you obtained a closed-form formula for AUC.
Since we have finite number of samples $\{(x_i, y_i)\}_{i=1}^N$, we'll have finite number of points on the ROC curve. We do linear interpolation in between.
First, some definitions. Suppose we'd like to evaluate an algorithm $A(x)$ that outputs a probability of $x$ lying in the positive class $+1$. Let's define $N_+$ as the number of samples in the positive class $+1$ and $N_-$ as the number of samples in the negative class $-1$. Now, for a threshold $\tau$ let's define False-Positive-Rate (FPR, aka 1-specificity) and True-Positive-Rate (TPR, aka sensitivity):
$$
\text{TPR}(\tau) = \frac{\sum_{i=1}^N [y_i = +1] [A(x_i) \ge \tau]}{N_+}
\quad \text{and} \quad
\text{FPR}(\tau) = \frac{\sum_{i=1}^N [y_i = -1] [A(x_i) \ge \tau]}{N_-}
$$
(where $[\text{boolean expression}]$ is 1 if expression is true, and 0 otherwise). Then, ROC curve is built from points of the form $(\text{FPR}(\tau), \text{TPR}(\tau))$ for different values of $\tau$. Moreover, it's easy to see that if we order our samples $x_{(i)}$ (note the parentheses) according to the algorithm's output $A(x_i)$, then neither $\text{TPR}$ nor $\text{FPR}$ changes for $\tau$ between consecutive samples $A(x_{(i)}) < \tau < A(x_{(i+1)})$. So it's enough to evaluate FPR and TPR only for $\tau \in \{A(x_{(1)}), \dots, A(x_{(N)})\}$. For $k^{\text{th}}$ point we have
$$
\text{TPR}_k = \frac{\sum_{i=k}^N [y_{(i)} = +1]}{N_+}
\quad \text{and} \quad
\text{FPR}_k = \frac{\sum_{i=k}^N [y_{(i)} = -1]}{N_-}
$$
(Note both sequences are non-increasing in $k$). These sequences define x
and y
coordinates of points on the ROC curve. Next, we linearly interpolate these points to get the curve itself and calculate area under the curve (Using a formula for area of a trapezoid):
$$
\begin{align*}
\text{AUC} &= \sum_{k=1}^{N-1} \frac{\text{TPR}_{k+1} + \text{TPR}_{k}}{2} (\text{FPR}_{k} - \text{FPR}_{k+1}) \\
&= \sum_{k=1}^{N-1} \frac{\sum_{i=k+1}^N [y_{(i)} = +1] + \tfrac{1}{2} [y_{(k)} = +1]}{N_+} \frac{[y_{(k)} = -1]}{N_-} \\
&= \frac{1}{N_+ N_-} \sum_{k=1}^{N-1} \sum_{i=k+1}^N [y_{(i)} = +1] [y_{(k)} = -1]
= \frac{1}{N_+ N_-} \sum_{k < i} [y_{(k)} < y_{(i)}]
\end{align*}
$$
Here I used the fact that $[y = -1] [y = +1] = 0$ for any $y$.
So there you have it: AUC is proportional to the number of correctly ordered pairs, which is proportional to the probability of random pair of samples being ranked according to their labels.