15

I struggle to find a way to compute the p-value for the area under a receiver operator characteristic (ROC). I have a continuous variable and a diagnostic test result. I want to see if AUROC is statistically significant.

I found many packages dealing with ROC curves: pROC, ROCR, caTools, verification, Epi. But even after many hours spent reading the documentation and testing, I couldn't find how. I think I've just missed it.

user32530
  • 331
  • 1
  • 2
  • 7
  • 2
    What could it possibly mean for the area under the curve to be 'significant'? – gung - Reinstate Monica Nov 09 '13 at 14:46
  • I wanted to say testing if AUC value is statistically different from 0.5 – user32530 Nov 09 '13 at 16:08
  • What did your ROC curve come from? Presumably you want a test of that (eg, there is a p-value for a logistic regression model taken as a whole). – gung - Reinstate Monica Nov 09 '13 at 16:15
  • Well, my data is like the following, I have a standard test that makes the grouping into with/without disease, and I want to find a cut-off value for a biological determination from a blood sample. Beside that I need the area under the curve. So no, I don't have any regression model – user32530 Nov 10 '13 at 18:52
  • So you have some test that is performed on a sample of blood drawn from a patient, which gives you a number; & you will want to use that number to classify if the patient has the disease. At present, you have a set of numbers from this test for a set of patients where you know their true disease state. Is all of that correct? – gung - Reinstate Monica Nov 10 '13 at 19:05
  • Yes it is correct. – user32530 Nov 10 '13 at 19:37

4 Answers4

13

In your situation it would be fine to plot a ROC curve, and to calculate the area under that curve, but this should be thought of as supplemental to your main analysis, rather than the primary analysis itself. Instead, you want to fit a logistic regression model.

The logistic regression model will come standard with a test of the model as a whole. (Actually, since you have only one variable, that p-value will be the same as the p-value for your test result variable.) That p-value is the one you are after. The model will allow you to calculate the predicted probability of an observation being diseased. A Receiver Operating Characteristic tells you how the sensitivity and specificity will trade off, if you use different thresholds to convert the predicted probability into a predicted classification. Since the predicted probability will be a function of your test result variable, it is also telling you how they trade off if you use different test result values as your threshold.


If you are not terribly familiar with logistic regression there are some resources available on the internet (besides the Wikipedia page linked above):

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
10

Basically you want to test H0 = "The AUC is equal to 0.5".

This is in fact equivalent as saying H0 = "The distribution of the ranks in the two groups are equal".

The latter is the null hypothesis of the Mann-Whitney (Wilcoxon) test (see for instance Gold, 1999).

In other words, you can safely use a Mann-Whitney-Wilcoxon test to answer your question (see for instance Mason & Graham, 2002). This is exactly what the verification package mentioned by Franck Dernoncourt does.

Calimo
  • 2,829
  • 17
  • 26
  • 1
    Why would it be of interest to show that predictions are not random? That does not assess usefulness. – Frank Harrell Jan 31 '14 at 15:03
  • 1
    @FrankHarrell Because in many cases your predictions might not be better than random - in which case the usefulness you report is actually nil. Sure, reporting a confidence interval of the usefulness measures (sensitivity and specificity) would be more useful. But testing the difference between two groups is commonplace in clinical literature at least (and in fact there the groups often don't differ) and I saw reviewers asking for it specifically. – Calimo Jan 31 '14 at 15:33
  • That makes little sense IMHO. I want to know how useful something is, not whether it is better than just flipping a coin. – Frank Harrell Jan 31 '14 at 20:41
  • If it's not better than flipping a coin, then why would you go through all that work? Just flip the coin. – Him Feb 09 '17 at 14:24
4

You can use roc.area() from the package verification:

install.packages("verification")
library("verification")

# Data used from Mason and Graham (2002).
a<- c(1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990,
 1991, 1992, 1993, 1994, 1995)
d<- c(.928,.576, .008, .944, .832, .816, .136, .584, .032, .016, .28, .024, 0, .984, .952)

A<- data.frame(a,d)
names(A)<- c("year", "p2")

# For model without ties
roc.area(A$event, A$p2)

It will return $p.value [1] 0.0069930071

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
  • Thank you so much, but I don't have any c and d values. I have a standard test that makes the grouping into with/without disease, and I want to find a cut-off value for a biological determination from a blood sample. Beside that I need the area under the curve. So no, I don't have any regression. I have the stdtest binary variable and biologicalvalue continuous variable – user32530 Nov 10 '13 at 18:53
  • oh ok I thought you had d, as I assumed you already had a ROC curve. – Franck Dernoncourt Nov 11 '13 at 03:50
  • 3
    It is usually a mistake to seek an arbitrary cutoff when the true relationship with disease probability is smooth. Also, testing the null hypothesis that the ROC area is 0.5 is a quite boring hypothesis. For most predictions you care how good the prediction is, not whether it is random. – Frank Harrell Nov 11 '13 at 04:13
  • No problem, and thank you, Frank Demoncourt, maybe there is a way to get d. – user32530 Nov 13 '13 at 20:59
  • In the medical field sometimes they need those cutoff points to create diagnosis tests. With those they want to find if the subject is ill or not, not to predict something. Sometimes they need to cut costs with a cheaper determination to identify the disease status. – user32530 Nov 13 '13 at 21:01
  • Thanks for your answer @FranckDernoncourt. How do you calculate the the probabilities c? – ecjb Aug 20 '19 at 07:23
0

Two ROC curves can be compared in pROC using roc.test(). This also produces a p-value. In addition, using roc(..., auc=TRUE, ci=TRUE) will give you the lower and higher confidence intervals along with the AUC in the output while creating the ROC object, which may be useful.

The following is working example code that tests whether the miles per gallon or the weight of a car is a better predictor of kind of transmission it comes equipped with (automatic or manual):

library(pROC)
roc_object_1 <- roc(mtcars$am, mtcars$mpg, auc=T, ci=T) #gives AUC and CI
roc_object_2 <- roc(mtcars$am, mtcars$wt, auc=T, ci=T) #gives AUC and CI

roc.test(roc_object_1, roc_object_2) #gives p-value

The weight is a significantly better predictor than the fuel consumption, it seems. However, this is comparing two curves, and not a single curve against a number such as 0.5. Looking at the confidence interval to see whether it contains the number 0.5 tells us whether it is significantly different, but it doesn't produce a p-value.

naco
  • 125
  • 7
  • Does it provide the p-value also? – Michael R. Chernick Nov 06 '17 at 20:26
  • Although the question is asked specifically in terms of R, our general policy here is that we are a *statistics* (machine learning, etc) Q&A site. Thus, it is necessary for a Q to have statistical content, & it is strongly preferred that As are not only provided in software specific terms. In light of that, can you say more about what this test is & how it works, beyond just mentioning that it exists in R & showing the R code for it? – gung - Reinstate Monica Nov 08 '17 at 20:45
  • Ok, I will update my answer to reflect some statistical background – naco Nov 08 '17 at 20:54