22

Suppose I have a chest. When you open the chest, there is a 60% chance of getting a prize and a 40% chance of getting 2 more chests. Let $X$ be the number of prizes you get. What is its variance?

Computing $E[X]$ is fairly straight forward: $E[X] = .4 \cdot 2 \cdot E[X] + .6$ which leads to $E[X] = 3$, but I'd also like to know the variance of the number of prizes, not just the average. $Var[X] = E[X^2] - E[X]^2 = E[X^2] - 9$, but I'm having trouble with $E[X^2]$. Anyone have any idea if this is simple? From simulation, I know that the variance is ~30.

Thanks

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Brian
  • 331
  • 1
  • 5

2 Answers2

26

Call the next chests as $X_1,X_2$. With $0.4$ probability, our new variable is $X_1+X_2$ and with $0.6$ probability, it is $1$. So,
$$\begin{align}E[X^2]&=0.4\times E[(X_1+X_2)^2]+0.6\times1^2\\&=0.4\times E[X_1^2+X_2^2+2X_1X_2]+0.6\\&=0.4\times(2E[X^2]+2E[X]^2)+0.6\\&=0.8\times E[X^2]+7.8\rightarrow E[X^2]=39\rightarrow\operatorname{var}(X)=30\end{align}$$

gunes
  • 49,700
  • 3
  • 39
  • 75
20

Actually, it's relatively simple to obtain formulas for the entire distribution as well as an easy procedure to compute any moment of it.


For $n=1,2,3,\ldots,$ let $f_n(p) = \Pr(X=n)$ with $p=0.6.$ Define

$$F_p(t) = f_1(p)t + f_2(p)t^2 + \cdots + f_n(p)t^n + \cdots$$

(the probability generating function). The problem asserts

$$F_p(t) = p\,t + (1-p)F_p^2(t),$$

a quadratic equation with solutions

$$F_p(t) = \frac{1}{2(1-p)}\left(1 \pm \sqrt{1 - 4p(1-p)t}\right).$$

Only the solution with a minus sign makes sense (because the other yields a negative value for $f_2(p)$). Expanding it as a formal power series in $t$ (using, for instance, the Binomial Theorem) gives

$$F_p(t) = \frac{1}{2(1-p)} \sum_{n=1}^\infty (-1)^{n-1} \binom{1/2}{n} \left(4p(1-p)\right)^n\,t^n = \sum_{n=1}^\infty f_n(p)\,t^n,$$

from which we can read off the entire distribution of $X$ term by term. Here's a plot of the log probabilities up to $n=80$ created using this formula:

Figure

(In R:

f <- function(p=0.6, n=1:80) (-1)^(n-1) * choose(1/2, n) * (4*p*(1-p))^n / (2*(1-p))
plot(f(), type="h", log="y")

)

Moreover,

$$E[X] = \frac{\mathrm{d}}{\mathrm{d}t}F_p(t)\bigg|_{t=1} = \frac{p}{\sqrt{1-4p(1-p)}} = 3$$

and

$$E[X(X-1)] = \frac{\mathrm{d}^2}{\mathrm{d}t^2}F_p(t)\bigg|_{t=1} = \frac{2p^2(1-p)}{\sqrt{\left(1 - 4p(1-p)\right)^3}} = 36,$$

whence

$$\operatorname{Var}(X) = E[X(X-1)] + E[X] - E[X]^2 = 36 + 3 - 3^2 = 30.$$

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 7
    Wonderful answer! (Wish I could give it more than +1.) For those of us lesser mortals who find it awkward to read binomial coefficients with fractions in them, the mass function [can be further "simplified"](https://math.stackexchange.com/questions/340124/binomial-coefficients-1-2-choose-k) to: $$f_n(p) = \frac{2}{n} {2(n-1) \choose n-1} p^n (1-p)^n.$$ – Ben Apr 16 '20 at 00:56
  • (+1) My answer transformed into a pet compared to this one :) – gunes Apr 16 '20 at 09:31
  • 6
    @Gunes I think your answer is excellent (I upvoted it before even contemplating posting this answer). I created this one because I realized we had few or no posts illustrating techniques of working with pgfs. It was helpful to have your answer available as a check that I was doing the algebra correctly, too. – whuber Apr 16 '20 at 12:02
  • @whuber can be also the t script included? Thanks. – Maximilian Apr 17 '20 at 10:17
  • @Maximilian I'm not sure what you are asking for: I posted the `R` code already. – whuber Apr 17 '20 at 10:54
  • Sorry overlooked this tiny two liner! Not usual for your standards – Maximilian Apr 17 '20 at 11:31