Bootstrap classifier predictions

Question

Let's say that I have a binary classifier and perform leave-one-out cross-validation.
I have, then, one vector of predicted $Y_{pred}$ and true $Y_{true}$ labels.
Is it correct to perform a bootstrapping on the pairs $(Y_{pred,i}, Y_{true,i})$ to estimate the CI of the accuracy?

In other words, given a dataset with $M$ samples, $(Y_{true},Y_{pred}\in\{0,1\}^M)$:

for $n=1, \ldots,N$:
define $I$ by randomly select $M$ indices $i\in\{1,\ldots,M\}$ with replicates
calculate the accuracy with the selected pairs $a_n=Accuracy(Y_{true,i},Y_{pred,i}), \quad i\in I$

This procedure gives me $N$ values for the accuracy $a_n$ from which I can estimate the CI.
Is this a correct procedure to estimate the variability of the accuracy? If not, what does this procedure estimates?

Tim · Accepted Answer · 2020-06-25T15:42:10.223

0

What you seem to be missing, is that after sampling the bootstrap sample, you use this data to train the model, then you make predictions using it, and calculate accuracy. Sampling the results calculated on all data would not work, because it does nothing to check what would be the results if you had different data to train your model.

See this and this thread for some explanations how does bootstrap work. TL;DR it imitates sampling from the population, by sampling from the empirical distribution. Randomly resampling the accuracy calculated on the full sample does not imitate anything related to sampling your data, so there is no reason why this procedure would be useful to calculating anything.

edited Jun 25 '20 at 15:42

answered Jun 25 '20 at 14:30

Tim

108,699
20
212
390

Yes I get it. But what does the distribution that I get with this procedure represent? – Jun 25 '20 at 14:32
@ping nothing specific. – Tim Jun 25 '20 at 14:40
Ok thank you for the answer. – Jun 25 '20 at 14:42

Bootstrap classifier predictions

1 Answers1