11

Just want to check some reasoning.

If my original sample is of size $n$ and I bootstrap it, then my thought process is as follows:

$\frac{1}{n}$ is the chance of any observation drawn from the original sample. To ensure the next draw is not the previously sampled observation, we restrict the sample size to $n-1$. Thus, we get this pattern:

$$ \frac{1}{n} \cdot \frac{1}{n-1} \cdot \frac{1}{n-2} \cdots \frac{1}{n-(n-1)} = \frac{1}{n!}. $$

Is this correct? I stumble on why it can't be $(\frac{1}{n})^n$ instead.

Jim
  • 1,912
  • 2
  • 15
  • 20
Jayant.M
  • 385
  • 1
  • 8
  • 1
    I'm not sure I'm following you. Why do you want to "ensure the next draw is not the previous sample"? In bootstrapping, the idea is to sample with replacement. That is, you *do* want it to be possible that the next draw is the same as one you've already drawn. – gung - Reinstate Monica Jan 23 '17 at 02:07
  • but won't that mean the bootstrapped sample is not the same as the original sample? – Jayant.M Jan 23 '17 at 02:15
  • I don't follow you. You don't necessarily want the bootsample to be identical to your sample, you just want to treat the sample as a model of the population. – gung - Reinstate Monica Jan 23 '17 at 02:26
  • 1
    So my question is what is the chance that the bootstrap sample IS the same as the original sample. I am interested in the bootstrap being identical to the sample – Jayant.M Jan 23 '17 at 02:36
  • Sorry if my question wasn't clear! – Jayant.M Jan 23 '17 at 02:37
  • If you sample $n$ items with replacement from a group of size $n$, there are $n^n$ possible samples. Only one them is the original sample. So, if all units are equally likely to be selected at each draw, the answer is $n^{-n}$. – gammer Jan 23 '17 at 02:43
  • Because, the more I think about it, the less certain I am that it's right. It could be more than $n^{-n}$ due to something related to the ordering being irrelevant. I'm not sure and don't feel like thinking about it any more. You're more than welcome to finish it off, gung. – gammer Jan 23 '17 at 02:57

1 Answers1

17

Note that at each observation position ($i=1, 2, ..., n$) we can choose any of the $n$ observations, so there are $n^n$ possible resamples (keeping the order in which they are drawn) of which $n!$ are the "same sample" (i.e. contain all $n$ original observations with no repeats; this accounts for all the ways of ordering the sample we started with).

For example, with three observations, a,b and c, you have 27 possible samples:

aaa aab aac aba abb abc aca acb acc 
baa bab bac bba bbb bbc bca bcb bcc 
caa cab cac cba cbb cbc cca ccb ccc 

Six of those contain one each of a, b and c.

So $n!/n^n$ is the probability of getting the original sample back.

Aside - a quick approximation of the probability:

Consider that:

$${\displaystyle {\sqrt {2\pi }}\ n^{n+{\frac {1}{2}}}e^{-n}\leq n!\leq e\ n^{n+{\frac {1}{2}}}e^{-n}}$$

so

$${\displaystyle {\sqrt {2\pi }}\ n^{{\frac {1}{2}}}e^{-n}\leq n!/n^n \leq e\ n^{{\frac {1}{2}}}e^{-n}}$$

With the lower bound being the usual one given for the Stirling approximation (which has low relative error for large $n$).

[Gosper has suggested using $n! \approx \sqrt{(2n+\frac13)\,\pi}n^ne^{-n}$ which would yield the approximation $\sqrt{(2n+\frac13)\pi}\,e^{-n}$ for this probability, which works reasonably well down to $n=3$, or even down to $n=1$ depending on how stringent your criteria are.]


(Response to comment:) The probability of not getting a particular observation in a given resample is $(1-\frac{1}{n})^n$ which for large $n$ is approximately $e^{-1}$.

For details see
Why on average does each bootstrap sample contain roughly two thirds of observations?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • Thank you! as a point of interest, what is the chance of not getting a particular entry in a sample? for example with the distribution of $a,b,c$ you gave, there is a 8/27 chance of not getting a sample with an $a$ – Jayant.M Jan 23 '17 at 03:14
  • 1
    That's already covered in other answers on site but I've added it above (briefly). – Glen_b Jan 23 '17 at 03:26
  • 1
    So, this is the probability of getting a sample which is a permutation of the original sample. Instead, the probability of getting exactly the same sequence as in the original sample (thus, same elements in the same order) is $(\frac {1}{n})^n$. Right? – DeltaIV Jan 23 '17 at 08:32
  • 1
    @deltaiv yes, only one of the $n!$ arrangements is in the original order. – Glen_b Jan 23 '17 at 09:07
  • thanks. Nice answer btw, I didn't know about this Gosper approximation (+1). – DeltaIV Jan 23 '17 at 10:48
  • 1
    Doesn’t Gosper’s approximation work well even down to $n=1$, not just down to $n=3$? I think 0.499 (for $n=2$) is a pretty good approximation to 0.5, and 0.996 (for $n=1$) is also quite close to 1.0. – Karl Ove Hufthammer Jan 26 '17 at 21:45
  • @Karl My yardstick for a good approximation was a little stricter than yours, but in any case the exact values are so easy to calculate for small $n$. I've modified my answer a little there, – Glen_b Jan 27 '17 at 00:01