I'm trying to convince myself that bootstrapping fails when estimating extreme order statistics (and thus functions thereof). This is a classic shortcoming of the bootstrap laid out in some detail by Chernick (2007, 2011). The reasoning for this is that other methods have been proposed to get around the issue of extreme values, such as the $m$-out-of-$n$ bootstrap.
Consider bootstrapping sample range $R = X_{(n)} - X_{(1)}$ from a $N(5, 1.5)$ distribution from $n$ = 100 observations.
This is easily accomplished in R as follows.
set.seed(1991)
x <- rnorm(100, 5, 1.5) # simulated data
r <- max(x) - min(x) # true value = 10.02242
f <- function(x, i) {
return(max(x[i]) - min(x[i]))
}
library(boot)
y <- boot(x, f, R = 10000)
plot(y) # histogram and QQ plot
boot.ci(y) # confidence intervals
Intervals :
Level Normal Basic
95% ( 8.97, 13.40 ) (10.02, 13.55 )
Level Percentile BCa
95% ( 6.50, 10.02 ) ( 7.12, 10.02 )
Calculations and Intervals on Original Scale
Some BCa intervals may be unstable
The true range is captured in two of the above intervals.
I also notice that the true/bootstrapped value is often the endpoint of the CI. I don't think this is a random occurrence, since R sometimes warns users of this, but I still consider the endpoint value falling within the interval.'
Here are the plots of the bootstrap distribution. This is what I'm getting at. Looking at these, it suggest something might be wrong -- and indeed this is the case.
The below plots look nothing like your typical bootstrap distribution when $n$ is large. Typically, when we bootstrap "nice" (smooth) statistics like the sample correlation say, the plots end up approximating a Gaussian distribution quite closely.
Besides checking whether or not bootstrap confidence intervals capture the true parameter value (and looking at the distribution plots), is there any other evidence suggesting that the traditional bootstrap fails in the case of estimating the sample distribution of extreme order statistics? I'm thinking that interval coverage could also be examined, but is this typically done (or is there enough proof in the intervals and plots themselves)?
Edit - In response to duplicate question Many responses to similar questions on CV are quite theoretical. While I appreciate and understand the theory behind the bootstrap, many researchers in other fields do not and would appreciate applied answers that use simulations/real examples in R.