I am rolling a fair die. What is the probability distribution of number of rolls until I first accumulate either: 1) Five ones 2) 20 occurrences of faces that are not one?
I'm happy to share the actual application if that would help.
I am rolling a fair die. What is the probability distribution of number of rolls until I first accumulate either: 1) Five ones 2) 20 occurrences of faces that are not one?
I'm happy to share the actual application if that would help.
You are performing the equivalent of throwing a coin with a probability $p=1/6$ of heads until either $a=5$ heads or $b=20$ tails ("non-heads") have appeared. If you have thrown it $n$ times, the chance of this event not happening is given by the Binomial distribution as
$$S(n;a,b,p) = \sum_{k=\max(0,n-b+1)}^{\min(n,a-1)} \binom{n}{k} p^k(1-p)^{n-k}.$$
(The sum equals zero whenever its lower limit exceeds its upper limit.)
Therefore the chance that $n\gt 0$ is the throw when either $a$ heads or $b$ tails are first observed is
$$f(n;a,b,p) = S(n-1;a,b,p) - S(n;a,b,p).$$
Obviously this must equal $0$ for $n \lt \min(a,b)$ or $n \ge a+b$. We therefore may easily report the entire distribution: here is the plot of its probability function $f$ between $0$ and $a+b=25,$ as computed by these formulas:
This simple solution becomes even simpler (and yields additional information about whether the tosses terminated with $a$ heads or $b$ tails) when we recognize the question can be framed as a random walk in the $(x,y)$ plane.
Start at the origin $(0,0)$. Whenever the coin comes up heads, move one unit up; otherwise, move one unit to the right. Stop the first time one of the absorbing barriers $y=a$ or $x=b$ is hit.
The geometry of this situation is shown in the second figure. It plots the points that can be reached on this walk, showing the absorbing barriers as black lines. The possible terminal points along those barriers are marked with black dots.
The number of times each terminal point was reached in 1000 iterations of this walk are depicted by the colors and sizes of the larger points. The path shown in red corresponds to a sequence in which one tail was observed, then one head, then 10 tails, a head, a tail, two heads, four tails, and a head. It comprised 21 coin tosses altogether.
Each path that reaches any particular point $(x,y)$ on the absorbing barrier consists of $x$ tails and $y$ heads and therefore has a chance of $p^y(1-p)^x$. Clearly, the last outcome in any path that terminates at $(x,a)$ was a heads. The number of such paths therefore is the number of distinct paths connecting $(0,0)$ to $(x,a-1)$, of which there are $\binom{x+a-1}{a-1}$. Consequently the chance of terminating at $(x,a)$ is
$$\Pr(x,a) = \binom{x+a-1}{a-1} p^{a}(1-p)^x.$$
Similarly the chance of terminating at $(b,y)$ is
$$\Pr(b,y) = \binom{y+b-1}{b-1} p^y(1-p)^b.$$
The chance of terminating after $n$ steps, with $\min(a,b)\le n \lt a+b-1$, therefore is the sum of two such expressions (one of which may be zero):
$$f(n;a,b,p) = \binom{n-1}{a-1} p^a(1-p)^{n-a} + \binom{n-1}{b-1} p^{n-b}(1-p)^b\text{ if }\min(a,b)\le n \lt a+b.$$
This counts the number of $n$-step paths that reach the absorbing barrier at the top or to the right, respectively, weighting each one by its probability.
The sudden leap in probability at $n=20$ in the first figure is now explained: for the first time (compared to smaller values of $n$), it becomes possible to end the tosses at the righthand barrier. This happens in a great number of cases, because it's (slightly) more likely that the right barrier will be reached before the top barrier is. (The chance of reaching the right barrier first is readily found by summing the probabilities associated with its five points, which is almost $63\%$.) We know that ending the walk at the right barrier is more probable because on average a path will rise by one unit $p=1/6$ of the time but will move to the right one unit $1-p=5/6$ of the time, for an average slope of $1/6:5/6 = 1/5$. A path with that slope reach the absorbing region at the location $(20, 20/5)=(20,4)$: on the righthand barrier.
Having slept on it, I think the strategy may be this:
From n=1 to sufficiently large,
sum the two conditional probabilities, and multiply the complement by S(n-1), the cumulative "survival" through the (n-1)th roll.
Take successive differences S(n-1)-S(n) to recover the probability distribution.
The setting is one comparative monitoring safety of marketed health products. You have two compared groups, possibly of unequal size, followed over time. Each adverse event is a binomial trial, as the event can derive from either drug A or drug B.