Intuition behind m-out-of-n bootstrap

Question

I am trying to get some intuition on why m-out-of-n bootstrap works but haven't been able to find good explanation. I would really appreciate any input on this.

I think I do understand what bootstrap is about -- estimating how $\sqrt{n}(T_n(X_1,...,X_n)-T(X;F))$ behaves using $\sqrt{n}(T_n(X_1^*,...,X_n^*)-T(X;\hat{F_n}))$. ($X_1,...,X_n$ drawn from $F$, the true CDF. And $X_1^*,...,X_n^*$ drawn from $\hat{F_n}$, the ECDF). From my understanding, when $T$ is a smooth function, bootstrap works fine. Sometimes when T is non-smooth (such as extreme order statistics, or $|\mu|$), m-out-of-n bootstrap can "smooth" things out and works.

My main question is:

Why does m-out-of-n bootstrap "smooth" things out?

I have two more things that I want to make sure I am understanding correctly.

Since only $m$ samples are drawn, how can the behavior (variability, etc.) of $T_m(X_1^*,...,X_m^*)$ resemble that of a sample statistics using $n$ observations ($T_n(X_1,...,X_n)$). Or is it only known that asymptotically they are the same?
When using m-out-of-n bootstrap method to find CI, do we need to scale the variance of $\sqrt{m}(T_m(X_1^*,...,X_m^*;\hat{F_n})-T(X;\hat{F_n}))$ by $\frac{n}{m}$ since we're drawing a smaller sample size from $\hat{F_n}$?

Hope my questions are clear.

score 3 · Accepted Answer · answered Jul 07 '20 at 21:51

I would argue that it's not so much that the $m$ of $n$ bootstrap does smoothing as that it makes smoothing unnecessary.

There are two components to the $m$ of $n$ bootstrap. The first is sampling just $m$ observations; the second is knowing the convergence rate.

A big part of the advantage of the subsampling is being able to handle the correct rate. If a statistic is $\sqrt{n}$-consistent and based on iid observations, the ordinary bootstrap pretty much has to work (Chapter 3.6 of van der Vaart & Wellner does this)

So, if you are looking to bootstrap the maximum, you need to know that it converges faster than $\sqrt{n}$ when you have a hard maximum. For example, with $U[0,\theta]$ you have $n(X_{(n}-\theta)=O_p(1)$. That means you need to scale the variance by $m^2/n^2$, not $m/n$.

Another big part is reducing ties. Again if you're going for the maximum, the ordinary bootstrap has the same maximum as the sample 0.632 of the time, whereas the sample never has the same maximum as the generating distribution. Subsampling means that the bootstrap sample doesn't have the same maximum as the original sample and so you get a useful distribution over the bootstrap replicates. You don't need the smoothness in the statistic, because the distribution of replicates is less discrete.

Thank you so much! This really helps. So far I've only studied statics converging at $\sqrt{n}$ rate. Do you happen know any book or classic example that talks about statics converging at a different rate? I really want to look more into this. Thanks again. — RevealedPreference, Jul 09 '20 at 00:59
*Subsampling* by Dimitris N. Politis, Joseph P. Romano, and Michael Wolf — Thomas Lumley, Jul 09 '20 at 01:29

Intuition behind m-out-of-n bootstrap

1 Answers1