If all you are doing is re-sampling from the empirical distribution, why not just study the empirical distribution? For example instead of studying the variability by repeated sampling, why not just quantify the variability from the empirical distribution?
-
8"*(In this sense,) the bootstrap distribution represents an (approximate) nonparametric, noninformative posterior distribution for our parameter. But this bootstrap distribution is obtained painlessly - without having to formally specify a prior and without having to sample from the posterior distribution. Hence we might think of the bootstrap distribution as a “poor man’s” Bayes posterior.*" Hastie et al. *The Elements of Statistical Learning*". Sect. 8.4. – usεr11852 Mar 04 '18 at 15:36
-
It's still more pain than just studying the empirical distribution. For example, one line of code will return you the variance of the empirical distribution. It also takes much less time than re-sampling. – ztyh Mar 04 '18 at 15:37
-
8How would we quantify the uncertainty of our estimates from the empirical distribution? – usεr11852 Mar 04 '18 at 15:39
-
I guess sometimes its not easy to calculate the variability of the estimate. Thanks for pointing out. – ztyh Mar 04 '18 at 15:43
-
@usεr11852 But I guess still, if your estimate is $T(X)$, then you want $E[T(X)^2]-E[T(X)]^2$, where expectation is with respect to the empirical measure. How hard can this be? calculate $T(X)$ or $T(X)^2$ for each sample observation and take a mean. So I'm not convinced... – ztyh Mar 04 '18 at 16:07
-
No, you don't want $E[T(X)^2]-E[T(X)]^2$ with respect to the empirical measure, you want it with respect to the true, unobserved measure, from which your data is but one sample. Bootstrapping is a way of getting a first-order (IIRC) approximation to many samples, which naturally will get you better results. – jbowman Mar 04 '18 at 16:12
-
Also if you map each sample $X$ to $T(X)$ you get a distribution of $T(X)$, which is the thing that the bootstrap samples will converge to after MANY samples. Why bother!! – ztyh Mar 04 '18 at 16:13
-
@jbowman bootstrap is only using the data from that one sample (empirical measure), so its not first order at all. – ztyh Mar 04 '18 at 16:15
-
2"Under mild regularity conditions, the bootstrap yields an approximation to the distribution of an estimator or test statistic that is at least as accurate as the approximation obtained from first-order asymptotic theory ". http://www.unc.edu/~saraswat/teaching/econ870/fall11/JH_01.pdf. – jbowman Mar 04 '18 at 16:35
-
@jbowman that approximation is only of the empirical measure you observed. Which you already have! So why not just use it... – ztyh Mar 04 '18 at 16:39
-
10You are arguing, not trying to understand. Believe me, you have not come to a realization that the bootstrap is worthless contra to that of many thousands of statisticians over four or so decades. You did not read the quote carefully. I think you have failed to grasp the key role randomness plays in statistics. Statements like "Why bother!!" with respect to "get a distribution of $T(X)$ are... unusual, to say the least. If you don't think it important to understand the distribution of your estimates, you might want to consider why the field of statistics exists at all, and re-think that. – jbowman Mar 04 '18 at 16:42
-
5@ztyh You say "if you map each sample $X$ to $T(X)$ you get a distribution of $T(X)$". Perhaps you should think about this, how would you map a single point $X_i$ to $T(X) = \bar{X}$? Or any function $T(X_1, X_2, \cdots X_n)$ for that matter. – knrumsey Mar 04 '18 at 20:07
3 Answers
Bootstrapping (or other resampling) is an experimental method to estimate the distribution of a statistic.
It is a very straightforward and easy method (it just means you compute with many random variants of the sample data in order to obtain, an estimate of, the desired distribution of the statistic).
You most likely use it when the 'theoretical/analytical' expression is too difficult to obtain/calculate (or like aksakal says sometimes they are unknown).
- Example 1: If you do a pca analysis and wish to compare the results with 'estimates of the deviation of the eigenvalues' given the hypothesis that there is no correlation in the variables.
You could, scramble the data many times and re-computing the pca eigenvalues such that you get a distribution (based on random tests with the sample data) for the eigenvalues.
Note that the current practices are gazing at a scree plot and apply rules of thumb in order to 'decide' whether a certain eigenvalue is significant/important or not.
- Example 2: You did a non-linear regression y ~ f(x) providing you with some estimate of bunch of parameters for the function f. Now you wish to know the standard error for those parameters.
Some simple look at the residuals and linear algebra, like in OLS, is not possible here. However, an easy way is to compute the same regression many times with the residuals/errors re-scrambled in order to get an idea how the parameters would vary (given the distribution for the error term can be modeled by the observed residuals).

- 43,080
- 1
- 72
- 161
-
2I think your example is not a bootstrap. Its just sampling from a known null distribution. Bootstrap is where you have one sample and repeatedly sample again from that sample. – ztyh Mar 04 '18 at 16:55
-
4In your question you imagine to calculate the variance of a sample, which is indeed simple and does not require bootstrapping. In my example I speak about a situation in which we have a value which is derived from the sample. Then we can not simply compute a variance anymore, still we wish to know how it varies. By scrambling the data many times and re-computing the pca eigenvalues you can get such a distribution (random) data that follows the distribution of your sample. If I am not mistaken this *is* called bootstrapping. – Sextus Empiricus Mar 04 '18 at 17:01
-
1Ok, I see where I was misunderstanding things. Your example makes sense. Thanks. – ztyh Mar 04 '18 at 17:07
The key thing is that the bootstrap isn't really about figuring out features of the distribution of the data, but rather figuring out features of an estimator applied to the data.
Something like empirical distribution function will tell you a fairly good estimate of the CDF from which the data came from...but by in isolating, it tells you essentially nothing about how reliable the estimators we build from that data will be. This is the question answered by using bootstrap.

- 17,741
- 1
- 39
- 84
-
1Using the (non-parametric) bootstrap to find "the distribution of the data" would be a laugh: it merely comes up with the empirical distribution function, which is exactly the set of data the analyst began with. Reminds me of college algebra when I'd "solve for X" and find "X=X". – AdamO Mar 16 '18 at 18:49
IF you know exactly what is the underlying distribution, then you don't need to study it. Sometimes, in natural sciences you know exactly the distribution.
IF you know the type of the distribution, then you only need to estimate its parameters, and study it in the sense you meant. For instance, sometime you know a priori that the underlying distribution is normal. In some cases you even know what is its mean. So, for normal the only thing that is left to find out is the standard deviation. You get the sample standard deviation from the sample, and voila, you get the distribution to study.
IF you don't know what is the distribution, but think that it is one of the several in the list, then you could try to fit those distribution to data, and pick the one that fits best. THEN you study that distribution.
FINALLY, often you don't know type of distribution you're dealing with. And you do't have a reason to believe that it belongs to one of 20 distributions that R can fit your data to. What are you going to do? Ok, you look at mean and standard deviations, nice. But what if it's very skewed? What if its kurtosis is very large? and so on. You really need to know all the moments of distribution to know, and study it. So, in this case non parametric bootstrapping comes handy. You don't assume much, and simple sample from it, then study its moments and other properties.
Though non-parametric bootstrapping is not a magical tool, it has issues. For instance, it can be biased. I think parametric bootstrapping is unbiased

- 55,939
- 5
- 90
- 176
-
1I think even if you did not know the true distribution, many moments are easy to calculate. So I think the problem is not with not knowing the type of distribution you are dealing with. Rather it is about what kind of statistic you are trying to study. Some statistic might be hard to calculate and only then is bootstrap useful. – ztyh Mar 04 '18 at 15:57
-
Like in the comment to the question to usεr11852, actually I have doubts about benefits in regards to computability of statistics as well... – ztyh Mar 04 '18 at 16:10
-
Actually I think its still a no brainer. You map each sample to $\ln (x^3+x)$. Then finding the quantile is again a 1 line code. So 2 lines of code in total. – ztyh Mar 04 '18 at 16:22
-
1quantile was a stupid example, I'll give you that. try mean instead. in my practice I have to forecast $x*z$ or even more complex functions $f(x,z)$ where $x,z$ are from an unknown joint distribution. I need to obtain properties of the final forecast. try that with moments. with bootstrapping it's a no brainer. – Aksakal Mar 04 '18 at 16:28
-
1How ever complicated $f$ maybe, all you have to do is map the samples of $x$ and $z$ to $f(x,z)$. Then study those mapped samples. If you can use bootstrap, then that means you can do this, and this is much easier... – ztyh Mar 04 '18 at 16:37
-
@ztyh, Ok, why don't you reproduce Table 1 from here: https://pdfs.semanticscholar.org/4aad/1756e88dba86399a75891895e00b160f5460.pdf Don't be confused with Monte Carlo lingo, this is a classical use of bootstrapping. It's a very well known work in certain circles. – Aksakal Mar 04 '18 at 17:00
-