Questions tagged [bootstrap]

The bootstrap is a resampling method to estimate the sampling distribution of a statistic.

The bootstrap is a technique to estimate the sampling distribution of a statistic, based on resampling from a dataset, estimating the parameters from the re-sampled data, and comparing those estimates to the (known) values for the dataset itself. There are many variants of bootstrapping used in specialized analyses.

For an extensive review of the bootstrap see:

  • Horowitz, J.L. (2001) "The Bootstrap", Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (eds.), Handbook of Econometrics, Edition 1, Vol. 5, Chapter 52, pp.3159-3228
  • Efron, B. and Tibshirani, R.J. (1994) "An Introduction to the Bootstrap", Chapman & Hall/CRC Monographs on Statistics & Applied Probability
1719 questions
393
votes
11 answers

Explaining to laypeople why bootstrapping works

I recently used bootstrapping to estimate confidence intervals for a project. Someone who doesn't know much about statistics recently asked me to explain why bootstrapping works, i.e., why is it that resampling the same sample over and over gives…
Alan H.
  • 4,899
  • 4
  • 20
  • 19
134
votes
5 answers

What is the .632+ rule in bootstrapping?

Here @gung makes reference to the .632+ rule. A quick Google search doesn't yield an easy to understand answer as to what this rule means and for what purpose it is used. Would someone please elucidate the .632+ rule?
russellpierce
  • 17,079
  • 16
  • 67
  • 98
130
votes
4 answers

Differences between cross validation and bootstrapping to estimate the prediction error

I would like your thoughts about the differences between cross validation and bootstrapping to estimate the prediction error. Does one work better for small dataset sizes or large datasets?
grant
  • 1,491
  • 2
  • 11
  • 10
101
votes
3 answers

What are examples where a "naive bootstrap" fails?

Suppose I have a set of sample data from an unknown or complex distribution, and I want to perform some inference on a statistic $T$ of the data. My default inclination is to just generate a bunch of bootstrap samples with replacement, and calculate…
raegtin
  • 9,090
  • 12
  • 48
  • 53
89
votes
2 answers

Resampling / simulation methods: monte carlo, bootstrapping, jackknifing, cross-validation, randomization tests, and permutation tests

I am trying to understand difference between different resampling methods (Monte Carlo simulation, parametric bootstrapping, non-parametric bootstrapping, jackknifing, cross-validation, randomization tests, and permutation tests) and their…
Ram Sharma
  • 2,226
  • 3
  • 20
  • 24
81
votes
4 answers

Can bootstrap be seen as a "cure" for the small sample size?

This question has been triggered by something I read in this graduate-level statistics textbook and also (independently) heard during this presentation at a statistical seminar. In both cases, the statement was along the lines of "because the sample…
James
  • 2,600
  • 1
  • 14
  • 26
69
votes
4 answers

Assumptions regarding bootstrap estimates of uncertainty

I appreciate the usefulness of the bootstrap in obtaining uncertainty estimates, but one thing that's always bothered me about it is that the distribution corresponding to those estimates is the distribution defined by the sample. In general, it…
user4733
  • 2,494
  • 2
  • 20
  • 31
59
votes
5 answers

Is it true that the percentile bootstrap should never be used?

In the MIT OpenCourseWare notes for 18.05 Introduction to Probability and Statistics, Spring 2014 (currently available here), it states: The bootstrap percentile method is appealing due to its simplicity. However it depends on the bootstrap…
Clarinetist
  • 3,761
  • 3
  • 25
  • 70
59
votes
1 answer

Bootstrap vs. jackknife

Both bootstrap and jackknife methods can be used to estimate bias and standard error of an estimate and mechanisms of both resampling methods are not huge different: sampling with replacement vs. leave out one observation at a time. However,…
Tu.2
  • 2,627
  • 6
  • 26
  • 26
55
votes
6 answers

Why on average does each bootstrap sample contain roughly two thirds of observations?

I have run across the assertion that each bootstrap sample (or bagged tree) will contain on average approximately $2/3$ of the observations. I understand that the chance of not being selected in any of $n$ draws from $n$ samples with replacement is…
xyzzy
  • 823
  • 2
  • 8
  • 7
52
votes
6 answers

Rule of thumb for number of bootstrap samples

I wonder if someone knows any general rules of thumb regarding the number of bootstrap samples one should use, based on characteristics of the data (number of observations, etc.) and/or the variables included?
hoyem
  • 871
  • 1
  • 7
  • 10
48
votes
3 answers

Is it possible to interpret the bootstrap from a Bayesian perspective?

Ok, this is a question that keeps me up at night. Can the bootstrap procedure be interpreted as approximating some Bayesian procedure (except for the Bayesian bootstrap)? I really like the Bayesian "interpretation" of statistics which I find nicely…
Rasmus Bååth
  • 6,422
  • 34
  • 57
48
votes
3 answers

Bootstrap vs. permutation hypothesis testing

There are several popular resampling techniques, which are often used in practice, such as bootstrapping, permutation test, jackknife, etc. There are numerous articles & books discuss these techniques, for example Philip I Good (2010) Permutation,…
Tu.2
  • 2,627
  • 6
  • 26
  • 26
46
votes
2 answers

How do you do bootstrapping with time series data?

I recently learned about using bootstrapping techniques to calculate standard errors and confidence intervals for estimators. What I learned was that if the data is IID, you can treat the sample data as the population, and do sampling with…
statnub
  • 741
  • 2
  • 7
  • 6
46
votes
3 answers

How are Random Forests not sensitive to outliers?

I've read in a few sources, including this one, that Random Forests are not sensitive to outliers (in the way that Logistic Regression and other ML methods are, for example). However, two pieces of intuition tell me otherwise: Whenever a decision…
makansij
  • 1,919
  • 5
  • 27
  • 38
1
2 3
99 100