4

A box contains $n$ balls numbered from 1 to $n$. Suppose you take a ball at a time, putting it back on the box, until you pick a ball twice. How many balls are you expected to take from the box?

Let $X$ be the r. v. of interest. Its support is every natural number from $2$ to $n+1$. If $k$ is one such number, to get the first repetition on the $k$-th pick, the $k-1$ previous ones must be distinct and $k$-th equal to some of those, therefore, $$\begin{align} \forall k\in\mathbb N\cap[2,\,n\!+\!1]\qquad \mathbb P(X=k) &= \frac nn\frac {n-1}n\ldots\frac {n-(k-2)}n \;\cdot\; \frac{k-1}n =\\ &=\frac{(n-1)!}{(n-k+1)!}\frac{k-1}{n^{k-1}} =\\ &= \frac{k-1}n\,\prod_{l=0}^{k-2}\left(1-\frac{l}n\right)\quad, \end{align}$$ so that $$\begin{align} E(X) &= \sum_{k=2}^{n+1}\,k\cdot\frac{k-1}n\,\prod_{l=0}^{k-2}\left(1-\frac{l}n\right) =\\ &=\frac1n\sum_{k=2}^{n+1}\,k(k-1)\prod_{l=0}^{k-2}\left(1-\frac{l}n\right)\quad. \end{align}$$

However, the book answer is $$E(X) = 2 + \sum_{k=1}^{n-1}\prod_{l=1}^k\left(1-\frac ln\right)\quad.$$

What am I doing wrong? Or, if the answers are actually the same, how to show that?

Luke
  • 425
  • 1
  • 3
  • 11

1 Answers1

4

Both answers are the same. The solution in the question is clearly presented and holds up. For reference, the answers it gives for $n=2, 3, \ldots, 7$ are

$$E(X) = 2,\frac{5}{2},\frac{26}{9},\frac{103}{32},\frac{2194}{625},\frac{1 223}{324},\frac{472730}{117649}, \ldots$$

A simpler way to obtain the expectation is to sum the survival function $S_n(k) = \mathbb{P}(X\gt k), k\ge 0.$ This is the probability that the first $k$ balls will be unique:

$$S_n(k) = \prod_{l=1}^{k-1} \left(1 - \frac{l}{n}\right).$$

Clearly $S_n(0)=S_n(1)=1$ (the value, by definition, of an empty product). Accounting for their sum separately yields

$$E(X) = S_n(0) + S_n(1) + \sum_{j=2}^n S_n(j) = 2 + \sum_{k=1}^{n-1} S_n(k+1).$$

(writing $j=k+1$ for the last step). That is the textbook answer and it gives the same sequence of values. Incidentally, a useful closed form expression for these values is

$$E(X) = 1 + n \int_0^{\infty } e^{-n t} (1+t)^{n-1} \, dt.$$

For instance, expanding the log of the integrand through second order at $0$ and integrating that MacLaurin series gives the approximation $$E(X)\approx \frac{2}{3} + \sqrt{\frac{\pi n}{2}}$$ for large $n$; it turns out to have two significant figures of accuracy even for $n\ge 5$ and three for $n \ge 75.$

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • Indeed, when asking the distribution of $X$ on the previous item, the book suggested investigating $\mathbb{P}(X\gt k)$, but I found it via the rationale I showed on the question. – Luke May 20 '14 at 01:37
  • 1
    I rolled back an edit because it reflected a misunderstanding of the (conventional) notation: an empty product is, *by definition,* equal to $1$. Thus when $k\le 1$, $\prod_{l=1}^{k-1}f(l) = 1$ for any function $f$. Although it might seem like a technical nicety, including the possibility of such empty products in the formula reflects the process of computing the survival function for $k=0$ and $k=1$, which--because such values of $k$ are less than the support of $X$--is (axiomatically) equal to $1$. – whuber May 20 '14 at 15:21