Questions tagged [precision-recall]

P&R are a way to measure the relevance of set of retrieved instances. Precision is the % of correct instances out of all instances retrieved. Relevance is the % of true instances retrieved. The harmonic mean of P&R is the F1-score. P&R are used in data mining to evaluate classifiers.

Precision and recall constitute a way to measure the relevance of set of retrieved instances. Precision is the proportion of correct instances retrieved out of all instances that are retrieved. Mathematically, precision is equivalent to the positive predictive value. Relevance is the proportion of true instances that exist that are retrieved. This is equivalent to sensitivity. Precision and recall are commonly used in data mining contexts to evaluate classifiers just as sensitivity and specificity are used in statistics to evaluate the discriminative ability of a logistic regression model. They can be examined individually or combined (via their harmonic mean) to create the F1-score:
$$ F_1 = \frac{2}{\frac{1}{{\rm Precision}}+\frac{1}{{\rm Recall}}} = 2\times\frac{{\rm Precision}\times {\rm Recall}}{{\rm Precision}+{\rm Recall}} $$


(Because precision and recall are closely related to, and easily confused with, sensitivity and specificity, the following attempts to disentangle them.)

If a classifier can call an object positive (relevant) or not, and the object can be positive or not in reality, there are four possible combinations (represented by a confusion matrix):

                                         Reality:
                                  Positive     Negative
            Classification:    ---------------------------
                              |             |             |
                   'positive' |     TP      |     FP      |
                              |             |             |
                               ---------------------------
                              |             |             |
                   'negative' |     FN      |     TN      |
                              |             |             |
                               ---------------------------

where TP is true positive, FP is false positive, FN is false negative, and TN is true negative. Then:
$$ {\rm Precision} = \frac{TP}{TP + FP} $$ (By contrast, specificity is: $\frac{\color{red}{TN}}{\color{red}{TN} + FP}$.)

and: $$ {\rm Recall} = ({\rm Sensitivity}) = \frac{TP}{TP + FN} $$

There are other ways of parsing a confusion matrix. Another is to compute the positive predictive value and negative predictive value. It may be worth noting that precision is the same as the positive predictive value.

Using either precision and recall or sensitivity and specificity will provide complete information about the performance of a classifier. Which set is used is mostly a matter of convention.

425 questions
226
votes
4 answers

ROC vs precision-and-recall curves

I understand the formal differences between them, what I want to know is when it is more relevant to use one vs. the other. Do they always provide complementary insight about the performance of a given classification/detection system? When is it…
Amelio Vazquez-Reina
  • 17,546
  • 26
  • 74
  • 110
110
votes
4 answers

How do you calculate precision and recall for multiclass classification using confusion matrix?

I wonder how to compute precision and recall using a confusion matrix for a multi-class classification problem. Specifically, an observation can only be assigned to its most probable class / label. I would like to compute: Precision = TP / (TP+FP)…
daiyue
  • 1,203
  • 2
  • 9
  • 7
92
votes
8 answers

How to compute precision/recall for multiclass-multilabel classification?

I'm wondering how to calculate precision and recall measures for multiclass multilabel classification, i.e. classification where there are more than two labels, and where each instance can have multiple labels?
Vam
  • 1,245
  • 1
  • 10
  • 9
71
votes
10 answers

How to interpret F-measure values?

I would like to know how to interpret a difference of f-measure values. I know that f-measure is a balanced mean between precision and recall, but I am asking about the practical meaning of a difference in F-measures. For example, if a classifier C1…
AM2
  • 1,237
  • 2
  • 11
  • 10
62
votes
3 answers

F1/Dice-Score vs IoU

I was confused about the differences between the F1 score, Dice score and IoU (intersection over union). By now I found out that F1 and Dice mean the same thing (right?) and IoU has a very similar formula to the other two. F1 / Dice:…
pietz
  • 723
  • 1
  • 6
  • 6
44
votes
1 answer

what does the numbers in the classification report of sklearn mean?

I have below an example I pulled from sklearn 's sklearn.metrics.classification_report documentation. What I don't understand is why there are f1-score, precision and recall values for each class where I believe class is the predictor label? I…
jxn
  • 749
  • 2
  • 7
  • 15
43
votes
2 answers

Area under Precision-Recall Curve (AUC of PR-curve) and Average Precision (AP)

Is Average Precision (AP) the Area under Precision-Recall Curve (AUC of PR-curve) ? EDIT: here is some comment about difference in PR AUC and AP. The AUC is obtained by trapezoidal interpolation of the precision. An alternative and usually…
mrgloom
  • 1,687
  • 4
  • 25
  • 33
39
votes
3 answers

What are correct values for precision and recall when the denominators equal 0?

Precision is defined as: p = true positives / (true positives + false positives) What is the value of precision if (true positives + false positives) = 0? Is it just undefined? Same question for recall: r = true positives / (true positives +…
35
votes
3 answers

Classification/evaluation metrics for highly imbalanced data

I deal with a fraud detection (credit-scoring-like) problem. As such there is a highly imbalanced relation between fraudulent and non-fraudulent observations. http://blog.revolutionanalytics.com/2016/03/com_class_eval_metrics_r.html provides a great…
33
votes
4 answers

Optimising for Precision-Recall curves under class imbalance

I have a classification task where I have a number of predictors (one of which is the most informative), and I am using the MARS model to construct my classifier (I am interested in any simple model, and using glms for illustrative purposes would be…
28
votes
3 answers

ROC vs Precision-recall curves on imbalanced dataset

I just finished reading this discussion. They argue that PR AUC is better than ROC AUC on imbalanced dataset. For example, we have 10 samples in test dataset. 9 samples are positive and 1 is negative. We have a terrible model which predicts…
27
votes
4 answers

What are correct values for precision and recall in edge cases?

Precision is defined as: p = true positives / (true positives + false positives) Is it correct that, as true positives and false positives approach 0, the precision approaches 1? Same question for recall: r = true positives / (true positives +…
Björn Pollex
  • 1,223
  • 2
  • 15
  • 18
26
votes
2 answers

What is "baseline" in precision recall curve

I'm trying to understand precision recall curve, I understand what precision and recall are but the thing I don't understand is the "baseline" value. I was reading this link…
hyeri
  • 361
  • 1
  • 3
  • 5
25
votes
5 answers

What impact does increasing the training data have on the overall system accuracy?

Can someone summarize for me with possible examples, at what situations increasing the training data improves the overall system? When do we detect that adding more training data could possibly over-fit data and not give good accuracies on the test…
madCode
  • 736
  • 1
  • 5
  • 14
21
votes
3 answers

High Recall - Low Precision for unbalanced dataset

I’m currently encountering some problems analyzing a tweet dataset with support vector machines. The problem is that I have an unbalanced binary class training set (5:2); which is expected to be proportional to the real class distribution. When…
1
2 3
28 29