Confidence Interval - Binary classification

Question

How do we calculate a confidence interval for a result in binary classifiers ?

CI for regression problems makes sense since we have a variable estimated output that I can calculate its estimated mean and then get the SE around it.

For classification problems, We only have metrics like Fpr/Tpr/AuC, precision/accuracy & class probabilities. Besides, class distribution is not usually approximated to a known distribution.

I am implementing a RandomForest classifier via Python for a biased binary classification problem.

Possible duplicate of [ROC/AUC Confidence Interval](https://stats.stackexchange.com/questions/109104/roc-auc-confidence-interval) Also this may be of interest https://stats.stackexchange.com/questions/358101/statistical-significance-p-value-for-comparing-two-classifiers-with-respect-to/358598#358598 — Sycorax, Sep 14 '18 at 15:09

score 2 · Answer 1 · answered Feb 07 '17 at 11:29

You may apply bootstrap to calculate confidence intervals.

Under this method, you draw a random sample of input data, train the model and calculate the error (be it accuracy, precision, Matthews coefficient, etc.). You repeat this procedure N times, and from the output distribution error you may then easily extract the confidence intervals.

You may find complete information on bootstrapping here: https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading24.pdf

score 2 · Answer 2 · answered Jul 14 '18 at 10:11

I'm not sure for which property you need the confidence interval, but here we go:

In case you need confidence intervals for the validation results (i.e. classifier has accuracy of p ± Δp), for proportions you can calculate binomial confidence intervals.
In case you ask about confidence intervals for the predictions:
- For an ensemble voting class labels (random forest), you could construct similar intervals for the proportion of trees that voted for the class which is predicted.
- You could also do something like bolstered error estimation: perturb your input data and measure the distribution of predictions to which the perturbed input is mapped. You'd need to have a good idea of the noise structure on your input, though.

Confidence Interval - Binary classification

2 Answers2

Linked