Is it possible to get high sensitivity but low precision

Question

Consider a case where the number of labelled data as 0 = 1400 and labelled as 1 =100. This dataset is imbalanced with a majority examples belonging to normal class (0) and minority being class labeled as 1.

The data labelled as 0 denote normal operating conditions and data labelled as 1 denote abnormal. I have considered 1 as Positive class and 0 as negative class.

Assuming the following confusion matrix is obtained for the binary classification

cmMatrix = 

                    predicted 0  predicted 1
           truth 0    1100 (TN)      300 (FP)
           truth 1    30 (FN)         70 (TP)


cmMatrix = [1100,300;30,70];
acc_0  = 100*(cmMatrix(1,1))/sum(cmMatrix(1,:));
acc_1  = 100*(cmMatrix(2,2))/sum(cmMatrix(2,:));

will give acc_0 = 78.5714 and acc_1 = 70

The confusion matrix is read as out of 1400 normal events, 1100 are correctly identified as normal and 300 are incorrectly identified as abnormal. Then, out of 100 abnormal events, 70 are correctly detected as abnormal whereas 30 are incorrectly detected as abnormal. I want to calculate the sensitivity and specificity for class 1 since that is of primary interest in abnormal event detection. This is how I did

Sensitivity = TP/(TP+FN) = 70/(70+30 ) = 0.70
Specificity = TN/(TN+FP) = 1100/(1100+300) = 0.78

Q1) In this example, the sensitivity for class 1 = accuracy for class 1. Is it always this case that the individual sensitivities for each class will be equal to its individual class accuracies?

Q2) Precision for class 1: TP/Predicted true = 70/(70+300) = 0.18 which is very low than class 1's accuracy. Is precision not connected to individual class accuracy?

Q3) For imbalanced dataset, how do we say that the classifier has done a good job?

score 2 · Accepted Answer · answered Jul 26 '18 at 06:41

2

What you call "accuracy for class 1",
```
acc_0  = 100*(cmMatrix(1,1))/sum(cmMatrix(1,:));
```
is nothing else than the number of instances correctly classified to class 1, divided by the total number of class 1 instances. This is exactly class 1 sensitivity.

Accuracy is defined on both classes. I do not think "class X accuracy" is a common concept, and I believe it would rather be confusing.
Precision is "connected" to accuracy via the confusion matrix and the definitions for all these concepts. Remember that none of these KPIs are good measurements of a classifier's performance.
Use probabilistic predictions and assess these using proper scoring rules.

answered Jul 26 '18 at 06:41

Stephan Kolassa

95,027
13
197
357

thank you for your answer. I have 2 comments, can you please clarify?(1) I wanted to point out that the formula for `acc_0` is for accuracy of class labelled `0` (the first row of the confusion matrix) and not class `1` since the formula evaluates to 1100/1100+300. Thank you for the links, I briefly skimmed through it. – Srishti M Jul 26 '18 at 06:50
(2) Not many resources appear on the metric for unbalanced classes. In general, how does one say that the classifier has done a good job if confusion matrix, accuracy, sensitivity are not sufficient especially for unbalanced class. In my case it can be seen that having high accuracy does not guarantee high precision. Even if the examples have been correctly classified (high sensitivity) yet we should not consider this metric and that classifier, why and then how to choose ? – Srishti M Jul 26 '18 at 06:51
1

(1) Sorry, my mistake. But the very same point applies to your calculation of `acc_1`. (2) The confusion matrix and [*all* KPIs derived from it](https://en.wikipedia.org/wiki/Sensitivity_and_specificity) are misleading. Do not bother with them. [Please read what I wrote here and switch to probabilistic predictions and proper scoring rules.](https://stats.stackexchange.com/a/312787/1352) – Stephan Kolassa Jul 26 '18 at 06:59
Thank you for the clarifications. I have read your answers and have noted your suggestions. Can you please let me know if sensitivity & specificity is generally calculated on the test set (the independent set that is never used in training) Or on the whole dataset that is to be deployed finally? – Srishti M Jul 26 '18 at 18:35
If at all, sensitivity and specificity make only sense on the test set. Any performance metric applied on the training sample can (and often will) be an incentive for overfitting. – Stephan Kolassa Jul 26 '18 at 19:35

Is it possible to get high sensitivity but low precision

1 Answers1