1

I understand that the basic premise of bootstrapping a dataset with N data points is:

  1. Sample with replacement from N a number of draws, D, equal to N;
  2. Repeat step 1 K times - that is, as much as you can stand computationally (slash when the estimated parameters/distribution is relatively normal and stable with minimal gain from increasing k, etc.)

Why does the number of draws D equal N? I can understand intuitively that it wouldn't be great to select fewer draws than N, since fewer draws will limit your ability to correctly estimate the mean. I also saw this post suggesting that there are cases when it might be permissible to draw a D smaller than N.

But since you're sampling with replacement, what is stopping you from drawing a D twice as large as N? Or three times? Is it because it would "over-weight" more data points with more frequent representation in the dataset relative to rarer datapoints?

Thank you for your help!

Kristin M
  • 11
  • 1
  • 2
    You're usually not estimating the population mean with bootstrapping, but the variance of the mean-estimate. If you take more datapoints, you'll artificially reduce that variance. – Matthew Drury Mar 02 '21 at 19:18

0 Answers0