Why on average does each bootstrap sample contain roughly two thirds of observations?

Question

I have run across the assertion that each bootstrap sample (or bagged tree) will contain on average approximately $2/3$ of the observations.

I understand that the chance of not being selected in any of $n$ draws from $n$ samples with replacement is $(1- 1/n)^n$, which works out to approximately $1/3$ chance of not being selected.

What is a mathematical explanation for why this formula always gives $\approx 1/3$ ?

I believe this is the origin of the $.632$ in the bootstrap 632+ rule. — gung - Reinstate Monica, Mar 06 '14 at 02:46
How is 1-1 in the numerator not 0? I have an equation for chance of not being picked in a particular bootstrap sample of size n of ((n-1)/n)**n. But now I have to figure out how to connect it to the euler's number. — mLstudent33, Feb 19 '20 at 03:45

score 51 · Answer 1 · answered Mar 06 '14 at 04:45

51

More precisely, each bootstrap sample (or bagged tree) will contain $1-\frac{1}{e} \approx 0.632$ of the sample.

Let's go over how the bootstrap works. We have an original sample $x_1, x_2, \ldots x_n$ with $n$ items in it. We draw items with replacement from this original set until we have another set of size $n$.

From that, it follows that the probability of choosing any one item (say, $x_1$) on the first draw is $\frac{1}{n}$. Therefore, the probability of not choosing that item is $1 - \frac{1}{n}$. That's just for the first draw; there are a total of $n$ draws, all of which are independent, so the probability of never choosing this item on any of the draws is $(1-\frac{1}{n})^n$.

Now, let's think about what happens when $n$ gets larger and larger. We can take the limit as $n$ goes towards infinity, using the usual calculus tricks (or Wolfram Alpha): $$ \lim_{n \rightarrow \infty} \big(1-\frac{1}{n}\big)^n = \frac{1}{e} \approx 0.368$$

That's the probability of an item not being chosen. Subtract it from one to find the probability of the item being chosen, which gives you 0.632.

answered Mar 06 '14 at 04:45

Matt Krause

19,089
3
60
101

Why does the sign flip for 1/e from the definition of e? Inside the brackets, for e it was (1+1/n)^n and now it is (1-1/n)^n – mLstudent33 Feb 19 '20 at 03:54
1

@Glen_B nails this in the last line of his answer. Start with $e^x = \lim_{n\to \infty} \left(1 + x/n \right) ^n$ and drop in $x=-1$ to get $\frac{1}{e}$ instead. You'll get $\lim_{n\to \infty} \left(1 + \frac{-1}{n}\right)^n$, which simplifies to $\lim_{n\to \infty} \left(1 - \frac{1}{n}\right)^n$. – Matt Krause Feb 19 '20 at 15:48
"Fact 2: When n is large, exp(x/n)≈1+x/n This follows from the series expansion for ex." So the x rather than 1 in the original equation is the result of some sort of approximation after working out a series. I've only touched a bit on Taylor and Fourier but is it etiher one of these that yields that approximation? – mLstudent33 Feb 19 '20 at 23:13
1

Exactly, it's the Taylor series: $e^x = 1 + \frac{x^1}{1!} + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots $, but with $x$ replaced by $\frac{x}{n}$. – Matt Krause Feb 20 '20 at 22:56

score 38 · Accepted Answer · edited Apr 13 '17 at 12:19

Essentially, the issue is to show that $\lim_{n\to\infty}(1- 1/n)^n=e^{-1}$
(and of course, $e^{-1} =1/e \approx 1/3$, at least very roughly).

It doesn't work at very small $n$ -- e.g. at $n=2$, $(1- 1/n)^n=\frac{1}{4}$. It passes $\frac{1}{3}$ at $n=6$, passes $0.35$ at $n=11$, and $0.366$ by $n=99$. Once you go beyond $n=11$, $\frac{1}{e}$ is a better approximation than $\frac{1}{3}$.

enter image description here

The grey dashed line is at $\frac{1}{3}$; the red and grey line is at $\frac{1}{e}$.

Rather than show a formal derivation (which can easily be found), I'm going to give an outline (that is an intuitive, handwavy argument) of why a (slightly) more general result holds:

$$e^x = \lim_{n\to \infty} \left(1 + x/n \right)^n$$

(Many people take this to be the definition of $\exp(x)$, but you can prove it from simpler results such as defining $e$ as $\lim_{n\to \infty} \left(1 + 1/n \right)^n$.)

Fact 1: $\exp(x/n)^n=\exp(x)\quad$ This follows from basic results about powers and exponentiation

Fact 2: When $n$ is large, $\exp(x/n) \approx 1+x/n\quad$ This follows from the series expansion for $e^x$.

(I can give fuller arguments for each of these but I assume you already know them)

Substitute (2) in (1). Done. (For this to work as a more formal argument would take some work, because you'd have to show that the remaining terms in Fact 2 don't become large enough to cause a problem when taken to the power $n$. But this is intuition rather than formal proof.)

[Alternatively, just take the Taylor series for $\exp(x/n)$ to first order. A second easy approach is to take the binomial expansion of $\left(1 + x/n \right) ^n$ and take the limit term-by-term, showing it gives the terms in the series for $\exp(x/n)$.]

So if $e^x = \lim_{n\to \infty} \left(1 + x/n \right) ^n$, just substitute $x=-1$.

Immediately, we have the result at the top of this answer, $\lim_{n\to\infty}(1- 1/n)^n=e^{-1}$

As gung points out in comments, the result in your question is the origin of the 632 bootstrap rule

e.g. see

Efron, B. and R. Tibshirani (1997),
"Improvements on Cross-Validation: The .632+ Bootstrap Method,"
Journal of the American Statistical Association Vol. 92, No. 438. (Jun), pp. 548-560

score 5 · Answer 3 · answered Apr 03 '16 at 08:53

Just adding to @retsreg's answer this can also be demonstrated quite easily via numerical simulation in R:

N <- 1e7 # number of instances and sample size
bootstrap <- sample(c(1:N), N, replace = TRUE)
round((length(unique(bootstrap))) / N, 3)
## [1] 0.632

score 5 · Answer 4 · answered Mar 06 '14 at 05:57

Sampling with replacement can be modeled as a sequence of binomial trials where "success" is an instance being selected. For an original dataset of $n$ instances, the probability of "success" is $1/n$, and the probability of "failure" is $(n-1)/n$. For a sample size of $b$, the odds of selecting an instance exactly $x$ times is given by the binomial distribution:

$$ P(x,b,n) = \bigl(\frac{1}{n}\bigr)^{x} \bigl(\frac{n-1}{n}\bigr)^{b-x} {b \choose x}$$

In the specific case of a bootstrap sample, the sample size $b$ equals the number of instances $n$. Letting $n$ approach infinity, we get:

$$ \lim_{n \rightarrow \infty} \bigl(\frac{1}{n}\bigr)^{x} \bigl(\frac{n-1}{n}\bigr)^{n-x} {n \choose x} = \frac{1}{ex!}$$

If our original dataset is big, we can use this formula to compute the probability that an instance is selected exactly $x$ times in a bootstrap sample. For $x = 0$, the probability is $1/e$, or roughly $0.368$. The probability of an instance being sampled at least once is thus $1 - 0.368 = 0.632$.

Needless to say, I painstakingly derived this using pen and paper, and did not even consider using Wolfram Alpha.

Ben · Answer 5 · 2021-03-12T20:51:31.573

If you want to look deeper into the sample coverage of the bootstrap, it is worth noting that simple-random-sampling with replacement gives an "occupancy number" that follows the classical occupancy distribution (see e.g., O'Neill 2019). Suppose we have an original sample containing $n$ data points and we take a bootstrap resample, also with $n$ points. Let $K_n$ denote the number of data points in the original sample that appear in the resample. It is well-known that this quantity follows the classical occupancy distribution, with mass function:

$$\mathbb{P}(K_m=k) = \text{Occ}(k|n,n) = \frac{(n)_k \cdot S(n,k)}{n^n}.$$

(The values $(n)_k = \prod_{i=1}^k (n-i+1)$ are the falling factorials and the values $S(n,k)$ are the Stirling numbers of the second kind.) The mean and variance of this occupancy number are:

$$\begin{align} \mathbb{E}(K_n) &= n \bigg[ 1 - \bigg( 1-\frac{1}{n} \bigg)^n \bigg], \\[6pt] \mathbb{V}(K_n) &= n \bigg[ (n-1) \bigg(1-\frac{2}{n} \bigg)^n + \bigg(1-\frac{1}{n} \bigg)^n - n \bigg(1-\frac{1}{n} \bigg)^{2n} \bigg]. \\[6pt] \end{align}$$

Taking $n \rightarrow \infty$ we get the asymptotic equivalence:

$$\mathbb{E}(K_n) \sim n \bigg( 1-\frac{1}{e} \bigg) \quad \quad \quad \mathbb{V}(K_n) \sim \frac{n}{e} \bigg( 1-\frac{1}{e} \bigg).$$

Consequently, as $n \rightarrow \infty$ the proportion of the original data points that are covered by the resample converges to $K_n/n \rightarrow 1-1/e \approx 0.632$. Whilst this is a slightly more complicated presentation of the issue, examination of the classical occupancy distribution allows you to fully describe the stochastic behaviour of the sample coverage.

score 1 · Answer 6 · answered Mar 11 '14 at 13:34

1

This can be easily seen by counting. How many total possible samples? n^n. How many NOT containing a specific value? (n-1)^n. Probability of a sample not having a specific value - (1-1/n)^n, which is about 1/3 in the limit.

answered Mar 11 '14 at 13:34

Maxim Khesin

131
5

Why on average does each bootstrap sample contain roughly two thirds of observations?

6 Answers6

Linked

Related