How can I derive confidence intervals from the confusion matrix for a classifier?

Question

I have am using k-fold cross validation to generate a confusion matrix for a classifier. I need to calculate 95% confidence intervals for the number of times each class is predicted when run against a bunch of input data.

So if my output after running 2000 samples through the classifier is:

Class A: 100
Class B: 1400
Class C: 500

I want to be able to report:

Class A: 100   +- (some value for a 95% interval)
Class B: 1400  +- (some value for a 95% interval)
Class C: 500   +- (some value for a 95% interval)

The interval for each class would depend on how good the classifier is for that class as indicated by the confusion matrix.

If this makes sense please give me some hints. Otherwise please point me in a better direction. I need something simple to report to unsophisticated users.

David M W Powers · Answer 1 · 2016-03-29T06:46:04.917

The question makes good sense. It is specifically noted that the contingency table is a result of cross-validation. Witten et al's Data Mining book (based around Weka) discusses a modified T-Test for (Repeated) Cross-Validation. A T-Test implicitly defines a confidence interval. Given we have a CV and each cell is an averaged statistic, CIs do exist per cell, although they will be most commonly calculated for the marginal statistics, and directly or via those for whole of table statistics.

In the following paper I explore adaptation of various generalizations of the confidence intervals applied to correlation to useful multiclass cases, and validate with monte carlo simulation, but it is difficult give a clear recommendation as the same measure can be overly conservative in some cases, and insufficiently conservative in others, nonetheless a reasonably choice is suggested and illustrated in simulations across a range of parameterizations:

Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation DMW Powers International Journal of Machine Learning Technology 2 (1), 37-63

It is possible to calculate recall and precision from a contingency table (divide a diagonal entry by the appropriate marginal sum) and their inverses (by the complement in the diagonal vs the margins - or simply to convert to binary tables) and define confidence intervals based on the Wald or Wilson techniques. A useful rule of thumb is introduced by Agresti et al for the normal distribution assumption at alpha=0.05, which is to add 2 positive and 2 negative examples. Tony Cai shows this is appropriate for the Binomial distribution and gives modified versions for the Negative Binomial (not applicable here) and the Poisson (arguably applicable, and used as an assumption in some of my derivations above).

The Poisson modification is probably most applicable here (think in terms of when another PpRr Predicted/Real pair might arrive) as it is focussed only on the class of interest and doesn't distributed errors/negativity amongst the other classes. It adds two more arrivals to a cell before calculating the statistics relating to that cell.

Wei Pan (2001) derives some other possible measures based around the binomial distribution and the T-test.

The Cai paper is here: http://www-stat.wharton.upenn.edu/~tcai/paper/Plugin-Exp-CI.pdf

My paper is here: http://dspace2.flinders.edu.au/xmlui/bitstream/handle/2328/27165/Powers%20Evaluation.pdf

This is something I'm still exploring - hence returning periodically to see if there's any new/useful contributions on the topic...

score 1 · Answer 2 · answered Jan 16 '15 at 15:17

1

I don't see the value in confidence intervals on (elements of) a contingency table. I suggest to consider ROC curves instead, because the confidence depends per prediction, not per class. That is assuming you have a model that is more informative than simply positive/negative.

Consider logistic regression at the standard threshold of 50% probability to decide an instance is positive. In terms of a contingency table, probabilities of 51% and 99% are treated the same even though the model's output clearly shows that they are not. A confidence interval on precision (for instance) would abstract all this information away.

answered Jan 16 '15 at 15:17

Marc Claesen

17,399
1
49
70

The thing is that this is for a reporting application that sums the predicted values over a set of data per class. The predictions have already been made. The only information I have is the number of predictions of each class and the confusion matrix from the model. I don't have any per-prediction information. – David Tinker Jan 16 '15 at 16:23
1

Precision and ROC are not applicable to the multiclass problem described here. Even though they can be applied to each of the pairwise subproblems or the one-vs-rest subprograms, optimizing Precision, Recall or ROC inherently optimizes one of these problems in a way that in general moves the others away from their optima. – David M W Powers Mar 29 '16 at 06:17

Felipe Gerard · Answer 3 · 2015-12-12T15:35:00.633

-1

I agree with the others. If you want it to make sense you'd have to give the "interval" for a particular individual. You could give for example a list of the most probable classes as a "CI" instead of just the most likely one for that observation. If you don't have the information per observation though, what's the point?

EDIT: With "the most likely classes" I mean for example those whose probabilities sum up to 95%.

edited Dec 12 '15 at 15:35

answered Dec 12 '15 at 15:27

Felipe Gerard

622
3
7

What individuals? What observations? This is the result of CV. There are a number of methods proposed for doing significance tests (see e.g. the Witten, Data Mining/Weka textbook). However, the question of getting confidence intervals for the conditional probabilities represented by either the individual cells or the margins (the recall and precision type stats) or any kind of aggregate statistics, does not have much of a literature - and there is particularly a dearth of discussion of the multiclass problem. – David M W Powers Mar 29 '16 at 06:22
Since I agree with @Marc Claesen, I don't see the benefit of the proposed confidence intervals. Therefore I proposed an alternative in observation space, which is where the confusion matrix comes from. It is not exactly an answer, you're right, but it's a tip and a way out. – Felipe Gerard Mar 29 '16 at 14:30

How can I derive confidence intervals from the confusion matrix for a classifier?

3 Answers3

Linked