What are the uses of bootstrap resampling?

Question

I'm still wrapping my head around bootstrapping but am struggling to think of how it is applied. I have looked at: Explaining to laypeople why bootstrapping works

So far, my understanding is that the key implicit assumption behind bootstrapping is that we assume our observed sample is a proxy for the population. In other words, the frequencies of our sample observations is proportionate to the population density.

We then sample from our observed data (treating it as the population) with replacement, and generate basically a bunch of smaller samples. From these smaller samples, we calculate statistics of interest like mean, SD, etc...

The second key point is that this allows us to produce sampling distributions from which we can then determine the precision of our sample statistics (mean, SD, etc...).

Maybe it's just when this topic has been introduced in class but I still don't get why we would use bootstrapping and when would we apply it. It seems to be redundant or add no value to what we are doing. Don't for most statistics (such as mean, regression coefficients B0, B1, etc...) have sampling distributions that are approximately normal? What would bootstrapping add?

I guess in class I have never thought of when I would ever use bootstrapping as it's just kinda thrown in as a side note (here's bootstrapping and look what it does). Is bootstrapping just another way to numerically solve something that we cannot analytically do? And if so, what's like a good example of this?

Like most explanations I have seen for bootstrapping tell you how to implement it or what it is but never tell you when you would actually apply it and the value it brings as an additional tool.

A couple of notes: 1) Because the sampling is done with replacement, I'm not sure I'd say that the bootstrapped sampling distribution is "proportionate" to the sample distribution. I'm not sure what word I would use here... 2) Where I use bootstrapping is to determine confidence intervals. This is useful for statistics where you may not know the sampling distribution of statistic, or have a simple formula to calculate the confidence interval. — Sal Mangiafico, Dec 04 '19 at 16:41
So you are able to determine the standard error for statistics when there isn't really a mathematical formula relating the sample standard deviation with the standard error? Like the standard error of the mean can be determined via the standard deviation of our sample. — confused, Dec 04 '19 at 17:40
Well, yes, you can get the standard error, but also the confidence interval by more direct methods like simply grabbing the middle 95% of bootstrapped values, as well as some more refined methods. — Sal Mangiafico, Dec 04 '19 at 18:28
Standard bootstrapping doesn't "generate basically a bunch of smaller samples." It takes multiple samples of the _same original size_ from the data _with replacement_ so that in any one bootstrap sample some original data points are taken more than once and others omitted. — EdM, Dec 04 '19 at 18:36

score 1 · Answer 1 · answered Dec 05 '19 at 16:20

I'll use the boot package to give an example of using bootstrapping to determine a 95% confidence interval for a statistic. There's nothing magic in the package, tho. You could easily write code to obtain the bootstrapped samples, and then get the percentile confidence intervals manually. Note that boot also reports the bias and standard error of the statistic, although I don't know how one might use these.

Here, let's imagine a statistic of interest is the median divided by the interquartile range. And A is just some data I made up to represent the sample data we have.

The following is in R, and can be run online at rdrr.io.

### Install packages

if(!require(boot)){install.packages("boot")}

library(boot)

### Our data and point estimate of our median / IQR statistic

A = c(0,1,2,3,3,3,4,4,5,5,5,6,6,6,7,7,8,8,8,9,9,10,11,12,13,14,15,16,17)

median(A) / IQR(A)

  ### 1.166667

### Bootstrap statistics and confidence intervals (simple percentile)

library(boot)

Function = function(input, index){
                    Input = input[index]
                    Stat  = median(Input) / IQR(Input)
                    return(Stat)}

Boot = boot(A, Function, R=5000)

Boot

   ### ORDINARY NONPARAMETRIC BOOTSTRAP
   ### 
   ### Bootstrap Statistics :
   ###     original       bias    std. error
   ### t1* 1.166667  0.0779824     0.3915979

boot.ci(Boot, conf = 0.95, type = "perc")

   ### Intervals : 
   ### Level     Percentile     
   ### 95%   ( 0.667,  2.250 )

hist(Boot$t, col = "darkgray")

### Note that the simple percentile confidence interval
###  is just the 2.5 and 97.5 th percentiles
###  of the bootstrapped median/IQR values 

quantile(Boot$t, c(0.025, 0.975))

   ###      2.5%     97.5% 
   ### 0.6666667 2.2500000

score 0 · Answer 2 · answered Dec 05 '19 at 16:52

To calculate robust confidence intervals for estimates.
To perform robust inference
To numerically calculate standard errors for which an estimator is not numerically tractable (e.g. some forms of mediation tests)
To assess the sensitivity of model-based models

What are the uses of bootstrap resampling?

2 Answers2