It is well known that the area under the curve ($AUC$) is equal to the Mann-Whitney U statistic (c.f. Why is ROC AUC equivalent to the probability that two randomly-selected samples are correctly ranked?). Therefore it has to be true that $AUC$ is asymptotically normal. However, we have that $AUC$ is bounded from below by zero and bounded by one from above - a normally distributed random variable has unbound support though. How is this possible?
-
3What justifies your deduction that "it has to be true that AUC is asymptotically normal"? This is not true--however, a related result based on the Central Limit Theorem is true. Thus, you can find the answer by pondering what the CLT asserts. – whuber Sep 06 '21 at 18:31
1 Answers
Mann-Whitney's $U$ is not equal to ROC AUC, it is proportional to the area under the ROC curve:
$$\text{AUROC } = \frac{U}{n_1n_0}$$ where $n_0$ is the number of negative examples and $n_1$ is the number of positive examples and $U$ is the Mann-Whitney $U$ statistic.
From this expression, it should be clear that even $U$ must be bounded, because $\text{AUROC}\in [0,1]$, therefore $U \in [0, n_1 n_0]$. Therefore, the same "boundedness" problem that you describe in your question exists for the $U$ statistic.
I think what you mean is that $U$ is approximately normal (for some definition of "approximate") in the case of a large sample size. The Wikipedia entry has a nice description of how a $U$ statistic that is approximately distributed as a normal distribution due to a large sample size can be standardized. https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#Normal_approximation_and_tie_correction
It's important to keep in mind the advantages and disadvantages of using an approximation. It's not hard to find that there are obvious disagreements between the normal approximation and the true distribution $U$. If your view is that the resulting error from approximation is too large for a particular purpose, then it's perfectly reasonable to not use the approximation and instead proceed with an approximation with lower error, or an exact method. On the other hand, these alternatives can be expensive, or even intractable, to compute, or simply more complex than we have time to implement and validate. Chiefly, approximations are a bargain, trading precision for convenience.

- 76,417
- 20
- 189
- 313
-
My point was that I did not understand why a rv (in that case the $AUC$) with compact support can be asymptotically normal, as the normal distribution has unbounded support. I think your last paragraph helped me solve the missunderstanding: a rv with compact support that is asymptotically normal will scale such that the probability mass under the normal pdf outside the support of the rv will be very close to zero. Correct? – lmaosome Sep 08 '21 at 20:32
-
-
hm, tbh I would not have understood this comment if I had not read your last paragraph. Thank you! – lmaosome Sep 08 '21 at 20:57