3

I have some metrics that all lie within [0,1], and I have multiple measurements of each. For example, one measure is accuracy (for a machine learning application). Accuracy will always lie within [0,1], and given multiple rounds of N-fold cross validation, there will be multiple measurements of accuracy. So let's say I have 50 values of accuracy, how can I form a confidence interval around accuracy, such that the confidence interval does not extend above 1 or below 0? More generally, how can I compute a confidence interval around an average of multiple measurements of a variable that always lies in [0,1], but does not necessarily adhere to a uniform distribution (e.g., a [0,1] truncated normal distribution)?

CopyOfA
  • 167
  • 4
  • 2
    Accuracy is just a binomial variable of getting the right answer or not, so [binomial methods](https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval) apply. However, you seem to be interested in assessing machine learning performance. [Accuracy, as well as other threshold-based metrics like sensitivity, specificity, and $F_1$ score, is surprisingly problematic.](https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models/312787#312787) – Dave Jul 14 '21 at 19:39
  • Thanks! The issue I'm having with using binomial methods is that I don't have the number of experiments/trials. I am given N measurements of accuracy (or pick other metric in [0,1]) and I would like to know the confidence interval of the average of these N measurements. Maybe I'm missing something from the Wiki link? – CopyOfA Jul 14 '21 at 20:06
  • $N$ measurements from something like cross validation? – Dave Jul 14 '21 at 20:10
  • Technically from N runs of (10-fold) cross validation. – CopyOfA Jul 14 '21 at 20:31
  • If you don't have the number of measurements, does that mean that all you have is the average accuracy? Then you are in a bit of a quandary, because of course the CI from a single accuracy will be a lot wider than the CI of an average of 100 accuracies. – Stephan Kolassa Jul 14 '21 at 20:40
  • No, I don't have the number of measurements that produce each accuracy value. That is, for a given run of 10-fold CV, I have an accuracy measurement. So for N runs of 10-fold CV, I have N accuracy values. – CopyOfA Jul 16 '21 at 13:58

1 Answers1

2

More of a comment than an answer, but I need the space provided by Answer format.

If you have 50 values from an unknown distribution with support $[0,1],$ then you might consider a bootstrap confidence interval for the population mean.

# Fictitious data 
set.seed(714)
x = rbeta(50, 4, 10)   # Pop. mean 4/14 = 0.2857
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1026  0.2131  0.2925  0.2948  0.3440  0.5638 

# Quantile bootstrap
set.seed(2021)
a.re = replicate(3000, mean(sample(x, 50, rep=T)))
CI = quantile(a.re, c(.025,.975));  CI

     2.5%     97.5% 
0.2636896 0.3252635   # contains pop. mean (this time & usually)

hist(a.re, prob=T, col="skyblue2", main="Bootrap Dist'n: Resampled Means")
 abline(v=CI, col="orange", lwd=2, lty="dotted")

enter image description here

There are many styles of 95% nonparametric bootstrap CIs, just be sure to choose one that is constrained to give endpoints in $[0,1].$

BruceET
  • 47,896
  • 2
  • 28
  • 76