Questions tagged [roc]

Receiver Operating Characteristic, also known as the ROC curve.

Receiver Operating Characteristic curve, also known as an ROC curve, is a graphical plot comparing the true positive rates and the false positive rates of a classifier as the discrimination threshold of the classifier is varied.

The true positive rate, defined as is the the fraction of true positives out of the positives, is also called the sensitivity or recall. The false positive rate, defined as the fraction of false positives out of the negatives, is equivalent to 1 - sensitivity

In its original form, the ROC curve was used to summarize performance of a binary classification task, although it can be extended for use in multi-class problems.

A classifier performing at chance is expected to have true positive and false positive rates that are equal, producing a diagonal line. Classifiers that exceed chance produce a curve above this diagonal. The area under the curve (or AUC) is commonly used as a summary of the ROC curve and as a measure of classifier performance. The AUC is equal to the probability that a classifier will rank a randomly chosen positive case higher than a randomly chosen negative one. This is equivalent to the Wilcoxon test of ranks.

ROC curves enable visualizing and organizing classifier performance without regard to class distributions or error costs. This can be helpful when investigating learning with skewed distributions or cost-sensitive learning.

Helpful reading includes:

814 questions
275
votes
6 answers

What does AUC stand for and what is it?

Searched high and low and have not been able to find out what AUC, as in related to prediction, stands for or means.
josh
  • 3,119
  • 4
  • 12
  • 14
226
votes
4 answers

ROC vs precision-and-recall curves

I understand the formal differences between them, what I want to know is when it is more relevant to use one vs. the other. Do they always provide complementary insight about the performance of a given classification/detection system? When is it…
Amelio Vazquez-Reina
  • 17,546
  • 26
  • 74
  • 110
95
votes
5 answers

How to calculate Area Under the Curve (AUC), or the c-statistic, by hand

I am interested in calculating area under the curve (AUC), or the c-statistic, by hand for a binary logistic regression model. For example, in the validation dataset, I have the true value for the dependent variable, retention (1 = retained; 0 = not…
Matt Reichenbach
  • 3,404
  • 6
  • 25
  • 43
89
votes
5 answers

How to plot ROC curves in multiclass classification?

In other words, instead of having a two class problem I am dealing with 4 classes and still would like to assess performance using AUC.
CLOCK
76
votes
1 answer

Understanding ROC curve

I'm having trouble understanding the ROC curve. Is there any advantage / improvement in area under the ROC curve if I build different models from each unique subset of the training set and use it to produce a probability? For example, if $y$ has…
Tay Shin
  • 965
  • 2
  • 7
  • 10
55
votes
6 answers

How to determine best cutoff point and its confidence interval using ROC curve in R?

I have the data of a test that could be used to distinguish normal and tumor cells. According to ROC curve it looks good for this purpose (area under curve is 0.9): My questions are: How to determine cutoff point for this test and its confidence…
Yuriy Petrovskiy
  • 4,081
  • 7
  • 25
  • 30
46
votes
7 answers

How to choose between ROC AUC and F1 score?

I recently completed a Kaggle competition in which roc auc score was used as per competition requirement. Before this project, I normally used f1 score as the metric to measure model performance. Going forward, I wonder how should I choose between…
George Liu
  • 653
  • 2
  • 7
  • 15
45
votes
6 answers

How to determine the optimal threshold for a classifier and generate ROC curve?

Let say we have a SVM classifier, how do we generate ROC curve? (Like theoretically) (because we are generate TPR and FPR with each of the threshold). And how do we determine the optimal threshold for this SVM classifier?
RockTheStar
  • 11,277
  • 31
  • 63
  • 89
37
votes
4 answers

Area under curve of ROC vs. overall accuracy

I am a little bit confused about the Area Under Curve (AUC) of ROC and the overall accuracy. Will the AUC be proportional to the overall accuracy? In other words, when we have a larger overall accuracy will we definitely a get larger AUC? Or are…
Samo Jerom
  • 1,439
  • 2
  • 19
  • 31
35
votes
3 answers

Why is AUC higher for a classifier that is less accurate than for one that is more accurate?

I have two classifiers A: naive Bayesian network B: tree (singly-connected) Bayesian network In terms of accuracy and other measures, A performs comparatively worse than B. However, when I use the R packages ROCR and AUC to perform ROC analysis,…
Jane Wayne
  • 1,268
  • 2
  • 14
  • 24
33
votes
4 answers

Optimising for Precision-Recall curves under class imbalance

I have a classification task where I have a number of predictors (one of which is the most informative), and I am using the MARS model to construct my classifier (I am interested in any simple model, and using glms for illustrative purposes would be…
30
votes
3 answers

What is the difference in what AIC and c-statistic (AUC) actually measure for model fit?

Akaike Information Criterion (AIC) and the c-statistic (area under ROC curve) are two measures of model fit for logistic regression. I am having trouble explaining what is going on when the results of the two measures are not consistent. I guess…
timbp
  • 1,067
  • 1
  • 11
  • 17
30
votes
3 answers

Can AUC-ROC be between 0-0.5?

Can AUC-ROC values be between 0-0.5? Does the model ever output values between 0 and 0.5?
Aman
  • 533
  • 1
  • 6
  • 10
29
votes
3 answers

ROC curve for discrete classifiers like SVM: Why do we still call it a "curve"?, Isn't it just a "point"?

In the discussion : how to generate a roc curve for binary classification, I think that the confusion was that a "binary classifier" (which is any classifier that separates 2 classes) was for Yang what is called a "discrete classifier" (which…
Abdelhak Mahmoudi
  • 291
  • 1
  • 3
  • 3
28
votes
3 answers

ROC vs Precision-recall curves on imbalanced dataset

I just finished reading this discussion. They argue that PR AUC is better than ROC AUC on imbalanced dataset. For example, we have 10 samples in test dataset. 9 samples are positive and 1 is negative. We have a terrible model which predicts…
1
2 3
54 55