7

I am reading the description of RF here.

In the "How random forests work" section, it is written that:

When the training set for the current tree is drawn by sampling with replacement, about one-third of the cases are left out of the sample. This oob (out-of-bag) data is used to get a running unbiased estimate of the classification error as trees are added to the forest

I am unable to understand if the one-third of the cases (out-of-bag samples size) is:

  • an arbitrary value defined in the algorithm
  • an estimate (e.g. on average, sampling with replacement will leave out one-third of the cases)

or something else.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
gc5
  • 877
  • 2
  • 12
  • 23
  • 1
    Does this answer your question? [Why on average does each bootstrap sample contain roughly two thirds of observations?](https://stats.stackexchange.com/questions/88980/why-on-average-does-each-bootstrap-sample-contain-roughly-two-thirds-of-observat) – Sycorax Jul 15 '20 at 00:01

1 Answers1

16

It comes from the construction of a bootstrap sample: you're sampling $n$ observations with replacement to a sample size of $n$. The probability that an observation is omitted is $(1-\frac{1}{n})^n.$* Now consider the definition of $\exp(-1)=\lim \limits_{n\to\infty}(1-\frac{1}{n})^n$ and observe that $\exp(-1)=0.3678...\approx\frac{1}{3}.$


*To verify this, I will define the probability space of the bootstrap: $\Omega=\{x_1, x_2, x_3, \dots, x_n\}$ where each $x_{i\in I=\{i\in\mathbb{N}:i\le n\}}$ is an observation, $\mathcal{F}=2^\Omega$. We will denote the boostrap sample as $B$. Note that we can take this $\sigma$-field $\mathcal{F}$ because we must have a finite number of observations. Collecting our bootstrap sample one observation at a time, our event of interest $E$ occurs when some observation $x_i$ is selected for the bootstrap sample, and we must define a probability measure for it. That is, $P(E)=P(\{x_i \in B\})$.

We can think of drawing a bootstrap sample as an experiment where there are $n$ trials. Each trial is selecting one of our observations uniformly at random with replacement, so it will either include $x_i$ with probability $P(E)=\frac{|E|}{|\Omega|}=\frac{1}{n},$ or exclude $x_i$ with probability $P(E^c)=\frac{|\Omega|-|E|}{|\Omega|}=1-\frac{1}{n}.$ Our probability space $(\Omega, \mathcal{F}, P)$ is now completely defined. The experiment we're performing has $n$ trials, so the probability that $x_i$ is omitted from all of them is $(\bigcup_{i=1}^n P(E))^c=\bigcap_{i=1}^n P(E^c)=(1-\frac{1}{n})^n$.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • 1
    +1. Could you please remind us how to get the $(1-\frac{1}{n})^{n}$ result? I know everyone should know that, but in reality that may not be the case so I think that would be helpful. – Antoine Sep 24 '15 at 17:16
  • @Antoine I just learned about probability spaces this semester, so my expansion is probably overkill... – Sycorax Sep 24 '15 at 21:36
  • Excellent explanation, thanks. The right side of your last line holds because the events are independent right? (independence coming from the fact that sampling is done with replacement). – Antoine Sep 25 '15 at 09:07
  • The trials must be independent by construction: each event has a specified probability that does not depend on anything but the cardinality of $E$ and $\Omega$ because we're sampling with replacement. – Sycorax Sep 25 '15 at 11:10
  • @Antoine it's because of the Taylor series that generates e^(n). Here's a proof: http://aleph0.clarku.edu/~djoyce/ma122/elimit.pdf – riders994 Apr 19 '17 at 03:31