In bootstrapping, within a single instance of resampling, why do you draw a number of samples equal to your starting N?

Question

I understand that the basic premise of bootstrapping a dataset with N data points is:

Sample with replacement from N a number of draws, D, equal to N;
Repeat step 1 K times - that is, as much as you can stand computationally (slash when the estimated parameters/distribution is relatively normal and stable with minimal gain from increasing k, etc.)

Why does the number of draws D equal N? I can understand intuitively that it wouldn't be great to select fewer draws than N, since fewer draws will limit your ability to correctly estimate the mean. I also saw this post suggesting that there are cases when it might be permissible to draw a D smaller than N.

But since you're sampling with replacement, what is stopping you from drawing a D twice as large as N? Or three times? Is it because it would "over-weight" more data points with more frequent representation in the dataset relative to rarer datapoints?

Thank you for your help!

You're usually not estimating the population mean with bootstrapping, but the variance of the mean-estimate. If you take more datapoints, you'll artificially reduce that variance. — Matthew Drury, Mar 02 '21 at 19:18

In bootstrapping, within a single instance of resampling, why do you draw a number of samples equal to your starting N?

0 Answers0