Confidence interval for measure that lies within [0, 1]

Question

I have some metrics that all lie within [0,1], and I have multiple measurements of each. For example, one measure is accuracy (for a machine learning application). Accuracy will always lie within [0,1], and given multiple rounds of N-fold cross validation, there will be multiple measurements of accuracy. So let's say I have 50 values of accuracy, how can I form a confidence interval around accuracy, such that the confidence interval does not extend above 1 or below 0? More generally, how can I compute a confidence interval around an average of multiple measurements of a variable that always lies in [0,1], but does not necessarily adhere to a uniform distribution (e.g., a [0,1] truncated normal distribution)?

Accuracy is just a binomial variable of getting the right answer or not, so [binomial methods](https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval) apply. However, you seem to be interested in assessing machine learning performance. [Accuracy, as well as other threshold-based metrics like sensitivity, specificity, and $F_1$ score, is surprisingly problematic.](https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models/312787#312787) — Dave, Jul 14 '21 at 19:39
Thanks! The issue I'm having with using binomial methods is that I don't have the number of experiments/trials. I am given N measurements of accuracy (or pick other metric in [0,1]) and I would like to know the confidence interval of the average of these N measurements. Maybe I'm missing something from the Wiki link? — CopyOfA, Jul 14 '21 at 20:06
If you don't have the number of measurements, does that mean that all you have is the average accuracy? Then you are in a bit of a quandary, because of course the CI from a single accuracy will be a lot wider than the CI of an average of 100 accuracies. — Stephan Kolassa, Jul 14 '21 at 20:40
No, I don't have the number of measurements that produce each accuracy value. That is, for a given run of 10-fold CV, I have an accuracy measurement. So for N runs of 10-fold CV, I have N accuracy values. — CopyOfA, Jul 16 '21 at 13:58

BruceET · Accepted Answer · 2021-07-14T21:17:47.187

More of a comment than an answer, but I need the space provided by Answer format.

If you have 50 values from an unknown distribution with support $[0,1],$ then you might consider a bootstrap confidence interval for the population mean.

# Fictitious data 
set.seed(714)
x = rbeta(50, 4, 10)   # Pop. mean 4/14 = 0.2857
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1026  0.2131  0.2925  0.2948  0.3440  0.5638 

# Quantile bootstrap
set.seed(2021)
a.re = replicate(3000, mean(sample(x, 50, rep=T)))
CI = quantile(a.re, c(.025,.975));  CI

     2.5%     97.5% 
0.2636896 0.3252635   # contains pop. mean (this time & usually)

hist(a.re, prob=T, col="skyblue2", main="Bootrap Dist'n: Resampled Means")
 abline(v=CI, col="orange", lwd=2, lty="dotted")

There are many styles of 95% nonparametric bootstrap CIs, just be sure to choose one that is constrained to give endpoints in $[0,1].$

This is a great suggestion. Thank you! – CopyOfA Jul 16 '21 at 13:59 — CopyOfA, Jul 16 '21 at 13:59

Confidence interval for measure that lies within [0, 1]

1 Answers1