0

Let's say that I have a binary classifier and perform leave-one-out cross-validation.
I have, then, one vector of predicted $Y_{pred}$ and true $Y_{true}$ labels.
Is it correct to perform a bootstrapping on the pairs $(Y_{pred,i}, Y_{true,i})$ to estimate the CI of the accuracy?

In other words, given a dataset with $M$ samples, $(Y_{true},Y_{pred}\in\{0,1\}^M)$:

  • for $n=1, \ldots,N$:
  • define $I$ by randomly select $M$ indices $i\in\{1,\ldots,M\}$ with replicates
  • calculate the accuracy with the selected pairs $a_n=Accuracy(Y_{true,i},Y_{pred,i}), \quad i\in I$

This procedure gives me $N$ values for the accuracy $a_n$ from which I can estimate the CI.
Is this a correct procedure to estimate the variability of the accuracy? If not, what does this procedure estimates?

1 Answers1

0

What you seem to be missing, is that after sampling the bootstrap sample, you use this data to train the model, then you make predictions using it, and calculate accuracy. Sampling the results calculated on all data would not work, because it does nothing to check what would be the results if you had different data to train your model.

See this and this thread for some explanations how does bootstrap work. TL;DR it imitates sampling from the population, by sampling from the empirical distribution. Randomly resampling the accuracy calculated on the full sample does not imitate anything related to sampling your data, so there is no reason why this procedure would be useful to calculating anything.

Tim
  • 108,699
  • 20
  • 212
  • 390