1

I have a ROC curve for which I'd like to calculate the AUC. I'm getting different values using the trapezoidal and rank-based approaches. What I'm noticing is that the two values actually add to 1.0 and the ROC curve itself suggests that the value from the trapezoidal rule is correct. Ideas on what's going on?

Here's an example dataset (with code lifted from How to calculate Area Under the Curve (AUC), or the c-statistic, by hand)...

norm <- c(0.184, 0.250, 0.462, 0.424, 0.436, 0.136, 0.078, 0.166, 0.042, 0.542, 0.274, 0.130, 0.210, 0.364, 0.276, 0.262, 0.284, 0.138, 0.242, 0.092, 0.104, 0.070, 0.260, 0.320, 0.342, 0.168, 0.108, 0.068, 0.060, 0.220, 0.038, 0.090, 0.096, 0.480, 0.424, 0.060, 0.394, 0.226, 0.056, 0.250, 0.122, 0.532, 0.460, 0.088, 0.470, 0.070, 0.480, 0.216, 0.098, 0.586, 0.154, 0.620, 0.094, 0.534, 0.070, 0.240, 0.226, 0.762, 0.110, 0.202, 0.076, 0.436, 0.514, 0.390, 0.254, 0.254, 0.140, 0.192, 0.500, 0.226, 0.690, 0.158, 0.522, 0.306, 0.588, 0.060, 0.130, 0.450, 0.034, 0.280, 0.510, 0.042, 0.256, 0.062, 0.106, 0.104, 0.206, 0.346, 0.036, 0.192, 0.260, 0.212, 0.708, 0.118, 0.398, 0.290, 0.118, 0.532, 0.354, 0.422, 0.540, 0.202, 0.676, 0.544, 0.276, 0.066, 0.764, 0.230, 0.406, 0.572, 0.718, 0.008, 0.188, 0.260, 0.094, 0.406, 0.102, 0.050, 0.358, 0.384, 0.062, 0.298, 0.510, 0.722, 0.264)
abnorm <- c(0.090, 0.330, 0.052, 0.204, 0.376, 0.066, 0.362, 0.320, 0.278, 0.444, 0.504, 0.086, 0.170, 0.394, 0.384, 0.382, 0.152, 0.136, 0.098, 0.092, 0.154, 0.126, 0.502, 0.646, 0.086, 0.260, 0.108, 0.264, 0.246, 0.088, 0.154, 0.166, 0.028, 0.552, 0.218, 0.198, 0.186, 0.212, 0.040, 0.026, 0.110, 0.242, 0.096, 0.434, 0.134, 0.490, 0.302)
wi <- wilcox.test(abnorm,norm))
w <- wi$statistic
w/(length(abnorm)*length(norm))
#        W 
#0.4378723 


tab=as.matrix(table(truestat, testres)) )
tot=colSums(tab)
truepos=unname(rev(cumsum(rev(tab[2,])))) )
falsepos=unname(rev(cumsum(rev(tab[1,])))) )
totpos=sum(tab[2,])
totneg=sum(tab[1,])
sens=truepos/totpos
omspec=falsepos/totneg
sens=c(sens,0)
omspec=c(omspec,0)

height = (sens[-1]+sens[-length(sens)])/2
width = -diff(omspec) # = diff(rev(omspec))
sum(height*width)
# [1] 0.5621277

When I use the ROC R package I get 0.438 and when I use the pROC I get 0.562 - again, these add to 1.0 making me think something weird is going on. I know these are both awful AUC values, but it's a bit disconcerting to see this level of difference.

Pat S
  • 113
  • 4

1 Answers1

1

First of all you didn't state why the ROC curve itself is relevant to the problem at hand. Since ROC curves are inconsistent with individual decision making and are based on backwards-time probabilities, it is hard to think of an example where ROC curves are helpful.

The $c$-index is the accepted nonparametric AUROC estimator. You can get it from the Wilcoxon test as you have done, or more directly using the R Hmisc package somers2 function whose main code is (mean(rank(x)[y == 1]) - (n1 + 1) / 2) / n2.

require(Hmisc)
somers2(c(norm, abnorm), c(rep(0, length(norm)), rep(1, length(abnorm))))

          C         Dxy           n     Missing 
  0.4378723  -0.1242553 172.0000000   0.0000000 

You should be able to replicate this with the use of all possible cutpoints that change sens and spec and the proper use of the trapezoidal rule.

If $Y=1$ is the correct coding for abnorm then your discrimination ability is worse than random guesses.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
  • Thanks - can you explain why the trapezoidal approach would give 0.562? When I plot the ROC curve for these data, the points are above the 45 degree line. – Pat S May 13 '16 at 10:40
  • 1 - .562 = .438 so for one of the two methods you have reversed labels of $Y=0$ and $Y=1$. – Frank Harrell May 13 '16 at 11:34
  • @PatS Maybe see this: https://stackoverflow.com/q/47569442/6103040 – F. Privé Dec 08 '17 at 17:03
  • pROC automatically switches the results if you are under 0.5 by default.... that's why. You did not switch the labels but pROC did. (that what the link from F.Privé also says. – R. Prost Feb 20 '18 at 08:45
  • 1
    I hope that bug in `pROC` is corrected. – Frank Harrell Feb 20 '18 at 13:29
  • Perhaps this warrants its own question, but how the the $c$-index be worth calculating if the ROC curve from which it is derived is worthless? – Dave Oct 28 '20 at 16:19
  • It is a coincidence that each point on the ROC curve represents a transposed conditional (P(known given unknown)) so the ROC curve itself is at odds with decision making, yet its area is a useful pure measure of discrimination (concordance probability aka WIlcoxon-Mann-Whitney U statistic). The c-index is useful for describing one model but not for comparing models (too insensitive for that). – Frank Harrell Oct 29 '20 at 10:58