8

I am rolling a fair die. What is the probability distribution of number of rolls until I first accumulate either: 1) Five ones 2) 20 occurrences of faces that are not one?

I'm happy to share the actual application if that would help.

Ben Bolker
  • 34,308
  • 2
  • 93
  • 126
  • 1
    Does this have an application other than getting a good grade on homework or a take-home exam? – Mark L. Stone Feb 05 '18 at 03:32
  • 3
    Sorry. I think I misunderstood the nature of the site. I'm neither a student nor a professional statistician. I was looking for advice on a real world problem. – Alec Walker Feb 05 '18 at 16:16
  • It was cast in a way that made it look like a textbook problem, and that will often lead people to think you're trying to get people to do homework. However, it's not just for that reason that it may be better not to over-abstract (textbookify) the problem -- often there are aspects of the original problem that may be important in considering a solution that a poster, unaware there could be any statistical issue related to it, has simply abstracted away and of which then no trace remains. – Glen_b Feb 05 '20 at 01:07

2 Answers2

11

You are performing the equivalent of throwing a coin with a probability $p=1/6$ of heads until either $a=5$ heads or $b=20$ tails ("non-heads") have appeared. If you have thrown it $n$ times, the chance of this event not happening is given by the Binomial distribution as

$$S(n;a,b,p) = \sum_{k=\max(0,n-b+1)}^{\min(n,a-1)} \binom{n}{k} p^k(1-p)^{n-k}.$$

(The sum equals zero whenever its lower limit exceeds its upper limit.)

Therefore the chance that $n\gt 0$ is the throw when either $a$ heads or $b$ tails are first observed is

$$f(n;a,b,p) = S(n-1;a,b,p) - S(n;a,b,p).$$

Obviously this must equal $0$ for $n \lt \min(a,b)$ or $n \ge a+b$. We therefore may easily report the entire distribution: here is the plot of its probability function $f$ between $0$ and $a+b=25,$ as computed by these formulas:

Figure


This simple solution becomes even simpler (and yields additional information about whether the tosses terminated with $a$ heads or $b$ tails) when we recognize the question can be framed as a random walk in the $(x,y)$ plane.

Start at the origin $(0,0)$. Whenever the coin comes up heads, move one unit up; otherwise, move one unit to the right. Stop the first time one of the absorbing barriers $y=a$ or $x=b$ is hit.

The geometry of this situation is shown in the second figure. It plots the points that can be reached on this walk, showing the absorbing barriers as black lines. The possible terminal points along those barriers are marked with black dots.

Figure 2

The number of times each terminal point was reached in 1000 iterations of this walk are depicted by the colors and sizes of the larger points. The path shown in red corresponds to a sequence in which one tail was observed, then one head, then 10 tails, a head, a tail, two heads, four tails, and a head. It comprised 21 coin tosses altogether.

Each path that reaches any particular point $(x,y)$ on the absorbing barrier consists of $x$ tails and $y$ heads and therefore has a chance of $p^y(1-p)^x$. Clearly, the last outcome in any path that terminates at $(x,a)$ was a heads. The number of such paths therefore is the number of distinct paths connecting $(0,0)$ to $(x,a-1)$, of which there are $\binom{x+a-1}{a-1}$. Consequently the chance of terminating at $(x,a)$ is

$$\Pr(x,a) = \binom{x+a-1}{a-1} p^{a}(1-p)^x.$$

Similarly the chance of terminating at $(b,y)$ is

$$\Pr(b,y) = \binom{y+b-1}{b-1} p^y(1-p)^b.$$

The chance of terminating after $n$ steps, with $\min(a,b)\le n \lt a+b-1$, therefore is the sum of two such expressions (one of which may be zero):

$$f(n;a,b,p) = \binom{n-1}{a-1} p^a(1-p)^{n-a} + \binom{n-1}{b-1} p^{n-b}(1-p)^b\text{ if }\min(a,b)\le n \lt a+b.$$

This counts the number of $n$-step paths that reach the absorbing barrier at the top or to the right, respectively, weighting each one by its probability.


The sudden leap in probability at $n=20$ in the first figure is now explained: for the first time (compared to smaller values of $n$), it becomes possible to end the tosses at the righthand barrier. This happens in a great number of cases, because it's (slightly) more likely that the right barrier will be reached before the top barrier is. (The chance of reaching the right barrier first is readily found by summing the probabilities associated with its five points, which is almost $63\%$.) We know that ending the walk at the right barrier is more probable because on average a path will rise by one unit $p=1/6$ of the time but will move to the right one unit $1-p=5/6$ of the time, for an average slope of $1/6:5/6 = 1/5$. A path with that slope reach the absorbing region at the location $(20, 20/5)=(20,4)$: on the righthand barrier.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 1
    Lovely, both solutions. The second looks like the sum of the two component negative binomials, in the range min(a,b)≤n – Alec Walker Feb 06 '18 at 18:04
  • It does. The connection becomes even more apparent when you interpret a negative binomial in terms of a random walk with a linear absorbing barrier. – whuber Feb 06 '18 at 18:14
0

Having slept on it, I think the strategy may be this:

  1. Convert each of the negative binomial probability distributions to conditional probabilities. i.e. conditional on not having gotten 5 ones in n-1 rolls, what is the probability of getting a 5th one on the nth roll?

From n=1 to sufficiently large,

  1. sum the two conditional probabilities, and multiply the complement by S(n-1), the cumulative "survival" through the (n-1)th roll.

  2. Take successive differences S(n-1)-S(n) to recover the probability distribution.

The setting is one comparative monitoring safety of marketed health products. You have two compared groups, possibly of unequal size, followed over time. Each adverse event is a binomial trial, as the event can derive from either drug A or drug B.

  • Although this approach is too vaguely described to evaluate, it probably doesn't give quite the right answer, because such combinations of objectives in sequential procedures tend to involve counterintuitive subtleties. See https://stats.stackexchange.com/questions/12174 for a generalization of your question. – whuber Feb 05 '18 at 23:37
  • Not only vague, but wrong! Thank you for your lucid explanations. – Alec Walker Feb 06 '18 at 18:04