1

How do we calculate a confidence interval for a result in binary classifiers ?

CI for regression problems makes sense since we have a variable estimated output that I can calculate its estimated mean and then get the SE around it.

For classification problems, We only have metrics like Fpr/Tpr/AuC, precision/accuracy & class probabilities. Besides, class distribution is not usually approximated to a known distribution.

I am implementing a RandomForest classifier via Python for a biased binary classification problem.

acer_7
  • 11
  • 1
  • 4
  • 2
    Possible duplicate of [ROC/AUC Confidence Interval](https://stats.stackexchange.com/questions/109104/roc-auc-confidence-interval) Also this may be of interest https://stats.stackexchange.com/questions/358101/statistical-significance-p-value-for-comparing-two-classifiers-with-respect-to/358598#358598 – Sycorax Sep 14 '18 at 15:09

2 Answers2

2

You may apply bootstrap to calculate confidence intervals.

Under this method, you draw a random sample of input data, train the model and calculate the error (be it accuracy, precision, Matthews coefficient, etc.). You repeat this procedure N times, and from the output distribution error you may then easily extract the confidence intervals.

You may find complete information on bootstrapping here: https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading24.pdf

Thanos
  • 31
  • 2
2

I'm not sure for which property you need the confidence interval, but here we go:

  • In case you need confidence intervals for the validation results (i.e. classifier has accuracy of p ± Δp), for proportions you can calculate binomial confidence intervals.

  • In case you ask about confidence intervals for the predictions:

    • For an ensemble voting class labels (random forest), you could construct similar intervals for the proportion of trees that voted for the class which is predicted.
    • You could also do something like bolstered error estimation: perturb your input data and measure the distribution of predictions to which the perturbed input is mapped. You'd need to have a good idea of the noise structure on your input, though.
cbeleites unhappy with SX
  • 34,156
  • 3
  • 67
  • 133