Is bootstrap problematic in small samples?

Question

In "3 Things That Bother Me" (1988), Ed Leamer writes:

Bootstrap estimates of standard errors are based on the assumption that the observed sample is the same as the true distribution, which is OK asymptotically. But a sample of size $n$ implies a distribution with $n$ mass points, which is quite unlike the true distribution if $n$ is small. For what sample sizes and what parent populations are the bootstrap estimates OK?

I had an impression that one of the main uses of bootstrap in statistics and econometrics is precisely in small samples. There, a bootstrap distribution is used when no analytical distribution is available and the sample is too small for the asymptotic distribution to be a good approximation of it. This makes Ed Leamer's criticism quite relevant and interesting. But perhaps my impression is wrong and I am misunderstanding things.

Q: Is this a valid piece of criticism? If so, has the problem been studied in any detail? Have any solutions been proposed?

What's small? It's surely standard that -- in plain bootstrapping -- if your sample is say 1,2,3 on one variable then some of the time bootstrapping will yield samples that are useless for many purposes (notably that many calculations fail if there are too many ties). And no bootstrap sample will include 0 or 5 or 42 if those are possible values. You know this, and it's "no free lunch" again. Some researchers prefer to simulate from a plausible parent and they then have to worry what that is. — Nick Cox, Aug 12 '20 at 12:00
Also, arguably, bootstrapping giving you poor results is sometimes to be written up as bootstrapping being a poor choice, but often the unwelcome lesson is that the dataset is not good enough for the analyst's (or client's) purposes. — Nick Cox, Aug 12 '20 at 12:01
@NickCox, right, *small* is not a concrete number. But perhaps my real question is, if we use bootstrap instead of the asymptotic distribution in a smallish sample because the asymptotic distribution is inapplicable given the sample size, is bootstrap just as inapplicable for the same reason? I used to think of bootstrap as a solution in case of a smallish sample. I am not talking $n=3$ but more like $n=30$ with a couple of parameters being estimated. — Richard Hardy, Aug 12 '20 at 12:10
(Continued) But now I start remembering that in some simple regression problems, bootstrap can have faster convergence than provided by asymptotic normality. Perhaps this is a justification for choosing bootstrap over an asymptotic distribution. — Richard Hardy, Aug 12 '20 at 12:13
I see the bootstrap primarily as a tool for when the distribution of whatever it may be is difficult to derive analytically. — einar, Aug 12 '20 at 12:47
@einar, it could well be the main use, I have only anecdotal knowledge. — Richard Hardy, Aug 12 '20 at 12:50
Some similarQs: https://stats.stackexchange.com/questions/112147/can-bootstrap-be-seen-as-a-cure-for-the-small-sample-size, https://stats.stackexchange.com/questions/33300/determining-sample-size-necessary-for-bootstrap-method-proposed-method, https://stats.stackexchange.com/questions/261928/can-i-use-bootstrap-to-deal-with-extremely-small-sample-size, https://stats.stackexchange.com/questions/209474/bootstrap-for-small-sample, https://stats.stackexchange.com/questions/59665/bootstrapping-to-find-confidence-intervals-very-small-sample-size — kjetil b halvorsen, Aug 12 '20 at 15:37
@kjetilbhalvorsen, thank you! These were relevant threads. Consequently, my understanding is the following: we need the sample to be large enough so that it approximates the population well. Without that, bootstrap will not work well. I do not yet understand whether bootstrap will nevertheless work better than the asymptotic approximation (for certain pivotal quantities, it seemingly will) or not, and thus how universally it may be considered a cure for small sample size and preferred to the asymptotic distribution. — Richard Hardy, Aug 12 '20 at 17:17
Wouldn't bootstrap standard errors be most useful for something with a nasty distribution (e.g. median or IQR), even when the sample size is gigantic? — Dave, Aug 12 '20 at 17:31
I don't know either, but the following I found interesting: https://stats.stackexchange.com/questions/220013/can-we-use-bootstrap-samples-that-are-smaller-than-original-sample — kjetil b halvorsen, Aug 12 '20 at 17:33
Well, yes, but it looks interesting that is can work sometimes when standard bootstrap does not. So I guess it can be a vehicle to understand bootstrap better. — kjetil b halvorsen, Aug 12 '20 at 18:04
@Dave, sure, it may be, though this is not the topic of the question. — Richard Hardy, Aug 12 '20 at 20:15
@RichardHardy Then for what do you want to approximate the analytical distribution? — Dave, Aug 12 '20 at 21:11
@Dave, for the same that you say, but in a small sample. But primarily my question is just as I have formulated. It arose while reading Ed Leamer's paper. — Richard Hardy, Aug 13 '20 at 04:59
Richard Hardy a humble query and/or suggestion: "There, a bootstrap distribution is used when no analytical distribution is available and the sample is too small **for the asymptotic distribution** to be a good approximation of it." [Emphasis added] should that be "for the asymtotic *normal* distribution", or are you thinking of limiting distributions other than the normal? — Alexis, Jan 22 '21 at 19:29
@NickCox "Plausible parent" sounds like (a) a good band name, (b) colorful family relationships, and (c) something said with a smirk to an irascible child's insistence on filiall duty. :D — Alexis, Jan 22 '21 at 19:32
@Alexis The metaphors are enticing. For example, small samples from a lognormal can deny their parentage in the sense that the skewness and kurtosis of a lognormal parent can be rather high, but sample skewness and kurtosis are bounded by functions of sample size. — Nick Cox, Jan 22 '21 at 19:36
@Alexis, I did not see the need to exclude nonnormal asymptotic distributions, perhaps because I am not really familiar with them. Is there a reason to exclude them though? — Richard Hardy, Jan 22 '21 at 19:40
This is an interesting question because of the contrast with permutation testing. Even in small samples, permutation testing should not have poor type I error rate calibration (just low power). So it still does what I think it's doing. For the bootstrap, I'm not sure. — eric_kernfeld, Dec 27 '21 at 16:43

score 1 · Answer 1 · answered Jan 05 '22 at 11:15

My short answer would be: Yes, if samples are very small, this can definitely be a problem since the sample may not contain enough information to get a good estimate of the desired population parameter. This problem affects all statistical methods, not just the bootstrap.

The good news, however, is that ‘small’ may be smaller than most people (with knowledge about asymptotic behavior and the Central Limit Theorem) would intuitively assume. Here, of cause, I’m referring to the normal (naive) bootstrap without dependent data or other peculiarities. According to Michael Chernick, the author of ‘Bootstrap Methods: A guide for Practitioners and Researchers’, small may be as small as N=4.

But this number of distinct bootstrap samples gets large very quickly. So this is not an issue even for sample sizes as small as 8.

For reference, see Chernick's great answer to a very similar question: Determining sample size necessary for bootstrap method / Proposed Method

Of cause the suggested sample sizes are subject to uncertainty and no universal threshold for a minimum sample size can be specified. Chernick therefore suggests to increase the sample size and study the convergence behavior. I believe is a very reasonable approach.

Here’s another quote from the same answer, which somehow addresses the premise you quoted initially:

Whether or not the bootstrap principle holds does not depend on any individual sample "looking representative of the population". What it does depend on is what you are estimating and some properties of the population distribution (e.g., this works for sampling means with population distributions that have finite variances, but not when they have infinite variances). It will not work for estimating extremes regardless of the population distribution.

Re the "good news:" Chernick's discussion of the "number of bootstrap [re]samples" is irrelevant. The fact that there are exponentially many ways to resample even a small dataset tells us *absolutely nothing* about how effectively that dataset represents a property of interest in a population from which it was drawn. — whuber, Jan 05 '22 at 18:42

Is bootstrap problematic in small samples?

1 Answers1

Linked