18

It is easy to find a package calculating area under ROC, but is there a package that calculates the area under precision-recall curve?

4 Answers4

20

As of July 2016, the package PRROC works great for computing both ROC AUC and PR AUC.

Assuming you already have a vector of probabilities (called probs) computed with your model and the true class labels are in your data frame as df$label (0 and 1) this code should work:

install.packages("PRROC")

require(PRROC)
fg <- probs[df$label == 1]
bg <- probs[df$label == 0]

# ROC Curve    
roc <- roc.curve(scores.class0 = fg, scores.class1 = bg, curve = T)
plot(roc)

# PR Curve
pr <- pr.curve(scores.class0 = fg, scores.class1 = bg, curve = T)
plot(pr)

PS: The only disconcerting thing is you use scores.class0 = fg when fg is computed for label 1 and not 0.

Here are the example ROC and PR curves with the areas under them:

ROC Curve with AUC

PR Curve with AUC

The bars on the right are the threshold probabilities at which a point on the curve is obtained.

Note that for a random classifier, ROC AUC will be close to 0.5 irrespective of the class imbalance. However, the PR AUC is tricky (see What is "baseline" in precision recall curve).

arun
  • 350
  • 4
  • 15
2

A little googling returns one bioc package, qpgraph (qpPrecisionRecall), and a cran one, minet (auc.pr). I have no experience with them, though. Both have been devised to deal with biological networks.

chl
  • 50,972
  • 18
  • 205
  • 364
  • This minet looked nice, but it needs to have some external adapter to make appropriate input from general data :-( –  May 08 '11 at 09:09
2

Once you've got a precision recall curve from qpPrecisionRecall, e.g.:

pr <- qpPrecisionRecall(measurements, goldstandard)

you can calculate its AUC by doing this:

f <- approxfun(pr[, 1:2])
auc <- integrate(f, 0, 1)$value

the help page of qpPrecisionRecall gives you details on what data structure expects in its arguments.

robertc
  • 21
  • 1
  • 1
    Doesn't the PR-curve require some more fancy integration? See: http://mnd.ly/oWQQw1 –  Aug 31 '11 at 12:37
2

AUPRC() is a function in the PerfMeas package which is much better than the pr.curve() function in PRROC package when the data is very large. pr.curve() is a nightmare and takes forever to finish when you have vectors with millions of entries. PerfMeas takes seconds in comparison. PRROC is written in R and PerfMeas is written in C.

leoluyi
  • 103
  • 3
jasoncolts
  • 21
  • 1