Comparisons between independent geometric random variables

Question

I was assigned this problem and I am struggling to see how I should approach it, the question is as follows:

Let $W_1$ and $W_2$ be independent geometric random variables with parameters $p_1$ and $p_2$. Find P($W_1$ = $W_2$), P($W_1$ $\ge$ $W_2$), and P($W_1$ $\le$ $W_2$).

I know that once we figure out the equals case, we can use that and some manipulation to get the other two but I would appreciate a little guidance there as well as I am not too confident in how to do that.

What I thought I could do was take the general terms for $W_1$ and $W_2$ which would be of the form $p_1^k(1-p_1)^{n-k}$ and form some kind of equality with them but I don't think that is right. My other thought was that for the two variables to be equal that would mean that each independent variable resulted in the same value ($W_1 = X = W_2$) and we could use the general terms to calculate the odds that both resulted in the same value $P(W_1 = X, W_2 = X)$ in a joint-distribution table but I don't see how we could generalize that. Are either of these approaches on the right track?

Please provide more details on what you know about geometric distributions, probability calculations, and independence, and what you tried so far. And please add the `self-study` tag. — Xi'an, Oct 08 '20 at 18:52
A path to the solution can be find here https://math.stackexchange.com/questions/2205734/difference-between-two-independent-geometric-distribution on the topic 'Difference between two independent geometric distribution. — AJKOER, Oct 08 '20 at 19:35
In my opinion, potential answers could provide paths for other distributions comparative statements. I have modeled to date (per outlined methodology) Geometric and recently Exponential distribution (as differences in Exponential link to Laplace (http://www.math.wm.edu/~leemis/chart/UDR/PDFs/ExponentialLaplace.pdf ) allowing an alternate estimate of Pr [ X1 <= X2] via Laplace of Y = X1 - X2, and therefrom Pr [ Y = X1 - X2 > 0 ] to compare probabilities answers) resulting in an ultimate application/cofirmation again of the Uniform Ratio distribution path on Pr [ X1 <= X2 ]. — AJKOER, Oct 09 '20 at 23:55

whuber · Accepted Answer · 2020-10-12T13:25:15.787

By remembering how the geometric distribution arises, we can solve this problem with almost no calculation.

The problem can be seen as a competition

A geometric random variable $W$ models the number of failures in a sequence of independent Bernoulli trials before the first success is observed. Its parameter $p$ is the chance of success in each trial.

The usual metaphor for a Bernoulli$(p)$ trial is the flip of a coin with probability $p.$ The problem, then, can be phrased in terms of a competition. It consists of a series of turns that is continued until a definite outcome is achieved:

You hold a coin with probability $p_1$ of heads and I hold a coin with probability $p_2$ of heads. On each turn we both flip our coins. If both outcomes are the same, we tie; if your coin is heads you win; if my coin is heads I win; and otherwise we continue the series. What are the chances (i) you win, (ii) I win, (iii) I tie, (iv) the series goes on forever?

The competition will have a definite outcome

Let's deal with that last possibility right away: at each turn the series will continue only when we each observe tails, which has a chance of $q=(1-p_1)(1-p_2).$ The chance of continuing through $n=1,2,\ldots$ turns without a definite outcome therefore is $q^n.$ Provided $q\lt 1,$ this converges to $0,$ demonstrating there is a vanishingly small chance that the series goes longer than $n$ turns. Unless both coins always come up tails ($p_1=p_2=0$), then, the chance of (iv) is zero.

The problem can be restated in terms of the competition's outcome

We have seen the game will eventually terminate. If, after it is over, the loser were to continue flipping until they, too, observed a heads, then the numbers of flips will both be realizations of geometric random variables $W_1$ and $W_2$ with parameters $p_1$ and $p_2.$ Evidently, you win when $W_1$ is less than $W_2,$ I win when $W_1$ exceeds $W_2,$ and otherwise we tie.

A simple equation determines the chance you win

Let's consider your chances of winning in a little more detail. You can win exactly when either (a) you toss a heads and I toss a tail on the current turn or (b) we both toss tails on the current turn, in which case the game effectively starts over at the beginning. The chance of (a) is $p_1(1-p_2)$ (because our tosses are independent) and the chance of (b) is $(1-p_1)(1-p_2).$ Therefore,

$$\Pr(W_1 \lt W_2) = \Pr(\text{You win}) = p_1(1-p_2) + (1-p_1)(1-p_2)\Pr(\text{You win}).$$

This simple (linear) equation for your winning chances is easily solved to give

$$\Pr(W_1 \lt W_2) = \Pr(\text{You win}) = \frac{p_1(1-p_2)}{1 - (1-p_1)(1-p_2)} = \frac{p_1 -p_1p_2}{p_1+p_2-p_1p_2}.$$

The rest is easy

Interchanging our roles merely swaps the subscripts, from which we read off

$$\Pr(W_1 \gt W_2) = \Pr(W_2 \lt W_1) = \frac{p_2 -p_1p_2}{p_1+p_2-p_1p_2}.$$

The chance of a tie plus the chance that somebody wins must equal $1,$ because the chance that this game goes on forever is zero. Thus

$$\Pr(W_1=W_2) = 1 - (\Pr(W_1 \lt W_2) + \Pr(W_1 \gt W_2)) = \frac{p_1p_2}{p_1+p_2-p_1p_2}.$$

Simulations indicate this answer is correct

As a check, I simulated this game ten million times where your coin, with $p_1 = 9/10,$ has a slight edge over mine with $p_2=10/11.$ Here are the frequencies of the results compared to the formula:

             Lose   Tie    Win
Simulation 0.0827 0.826 0.0917
Theory     0.0826 0.826 0.0917

True, most of the time we tie (because both coins so strongly favor heads), but you do win noticeably more often than I do, despite the tiny difference in the coins.

Here is the R code for the simulation. It takes a few seconds to run.

p1 <- 9/10   # Your chances of heads
p2 <- 10/11  # My chances of heads
n <- 1e7     # Number of iterations

set.seed(17)
W1 <- rgeom(n, p1)
W2 <- rgeom(n, p2)
Outcome <- ifelse(W1 > W2, "Win", ifelse(W1 < W2, "Lose", "Tie"))

print(rbind(Simulation = table(Outcome) / n,
            Theory = c(Win=p1 - p1*p2, Tie=p1*p2, Lose=p2-p1*p2)/(p1 + p2 - p1*p2)), 
      digits=3)
```

As an alternative solution that doesn't even need the solution of a "simple (linear) equation", we can consider the (conditional) probabilities of what happens on the very first trial on which the event $(T,T)$ _does not occur_, which nonoccurrence happens with probability $p_1+p_2-p_1p_2$. The conditional probabilities of $(H,T),(T,H),$ and $(H,H)$ given that $(T,T)$ did not occur are \begin{align}P((H,T)\mid (T,T)^c)&=\frac{p_1(1-p_2)}{p_1+p_2-p_1p_2}\\P((T,H)\mid (T,T)^c)&=\frac{p_2(1-p_1)}{p_1+p_2-p_1p_2}\\P((H,H)\mid (T,T)^c)&=\frac{p_1p_2}{p_1+p_2-p_1p_2}\end{align} — Dilip Sarwate, Oct 22 '20 at 12:02
@Dilip That is clever, it is well worth remembering, and obviously gives the right answers, but one step is missing: why should these conditional probabilities answer the question? In other words, what argument demonstrates that the agreement between these conditional probabilities and the correct answer is not just some kind of fluke? — whuber, Oct 22 '20 at 12:25
@whuber We are looking for the first _non-occurrence_ of $(T,T)$ which is when the decision occurs: I win ($(H,T)$ occurs), or you win ($(T,H)$ occurs), or we tie ($(H,H)$ occurs). So, we are looking at a _reduced sample space_ $\Omega^\prime = \{(H,T), (T,H),(H,H)\}$ in which the probabilities of the outcomes are normalized to sum to $1$ by dividing the original probabilities (as in the original measure) by $P(\{(H,T),(T,H)(H,H)\})$ (as given by the original measure) to get the probabilities as defined by the new measure. In equally likely sample spaces, we divide integers, here real numbers, — Dilip Sarwate, Oct 22 '20 at 16:13
@whuber It is just a small generalization of the standard result that on repeated independent trials of an experiment and mutually exclusive events $A$ and $B$, $$P\{A~\text{occurs before}~B\} = \frac{P(A)}{P(A)+P(B)}$$ to allow for the possibility that $A$ and $B$ are not mutually exclusive and so might possibly occur on the same trial. — Dilip Sarwate, Oct 22 '20 at 16:21
@Dilip Those are such good explanations that it would be nice to see them elevated to an answer. — whuber, Oct 22 '20 at 17:19
A clarification issue, a possible misleading interpretation issue here for the recursive 'equation' solution for the memoryless Geometric distribution (?) and associated accurate p of winning the game. My point, events where both players win (aka, a draw in my opinion) are apparently included. Proof: let p1 = p2 = 1/2, a perfectly fair game. The obvious answer (or substitute per Dilip Sarwate comment) is 1/2, however, that is not what the equation's 'solution' actually relates to (producing a value of 1/3, for better understanding, see 1st Dilip Sarwate's alternate solution discussion). — AJKOER, Oct 23 '20 at 12:46
@AJKOER _Your_ definition of "winning" is tossing a Head regardless of what the other person tosses and so $(H,H)$ is part of both "you win" and "I win". Thus, in the case of two far coins, my equations as well as whuber's show that P(I win, you lose), P(you win, I lose) and P(both toss Heads) all have probability $\frac 13$. You partition P(both toss Heads) into two equal parts of probability $\frac 16$ and assign one part to you and one to me to come uo with the answer that with fair coins, both players have equal chance of winning. $\frac 13 + \frac 16 = \frac 12$. — Dilip Sarwate, Oct 24 '20 at 20:22
Alas Dilip Sarwate: If we are speaking in general of 'games', including the likes of chess, a draw is never a win. Also, in a casino game of Black Jack, if both you and the Dealer get 21, don't expect any winnings either. Here a continuum of Pr(W1>W2) should not include draws, in my opinion, as is evident in any actual game including wagers. Just use the amended last term [(1−p1)(1−p2)+p1p2]*Pr(You win), and you arrive at an answer where, in a fair game, when p1=p2, one gets 1/2 (not 1/3). With my spreadsheet draw simulation, I return a blank cell (which is not counted in column statistics). — AJKOER, Oct 28 '20 at 12:19

Dilip Sarwate · Answer 2 · 2020-10-31T04:35:57.580

In accordance with whuber's suggestion, I am posting an extended version of some comments that I made on whuber's answer as a separate answer of my own.

The experiment consists of players A and B each (independently) tossing their individual coins that turn up Heads with probabilities $p_A$ and $p_B$ respectively. Repeated independent trials of this experiment are performed until at least one of A and B tosses a Head for the first time, at which point the game ends with A the winner if the outcome is $(H,T)$, B the winner if the outcome is $(T,H)$, and a tie if the outcome is $(H,H)$. The game ends on the very first trial on which the outcome is NOT $(T,T)$. Clearly, if $p_A=p_B=0$ (both players have two-tailed coins), the outcome of each trial is $(T,T)$ and the game never ends, and so to exclude this trivial case, we assume that both $p_A$ and $p_B$ cannot have value $0$. If exactly one of $p_A$ and $p_B$ has value $0$, then with $\{X,Y\} = \{A, B\}$ where $p_X = 0$ and $p_Y > 0$, we can say that Y is guaranteed to win the game (ties are impossible), and it takes an average of $\frac{1}{p_Y}$ trials for Y to actually win the game by tossing a Head.

So, assuming that $p_A > 0$, $p_B > 0$, the game is guaranteed to end in a finite number of trials (cf. whuber's answer cited above). Because of independence, we can ignore all trials on which $(T,T)$ is the outcome and concentrate on the very first trial on which the outcome $(T,T)$ does not occur meaning that the outcome is necessarily either $(H,T)$ in which case A wins, or $(T,H)$ in which case B wins, or $(H,H)$ in which case there is a tie. Note that the game ends at this point. So, all previous trials (if any) have resulted in $(T,T)$ and the current trial is the very first one on which the outcome is not $(T,T)$. Since the game ends at this point, there are no future trials to consider.

Given that the event $\{(H,T), (T,H), (H,H)\}$ has occurred, what is the conditional probability that the outcome is $(H,T)$ and so A wins? the conditional probability that the outcome is $(T,H)$ and so B wins? the conditional probability that the outcome is $(H,H)$ and so the game ends in a tie? We have \begin{align} P((H,T)\mid (T,T)^c) &= \frac{P(H,T)}{P(\text{at least one of A and B tosses a Head})}\\ &= \frac{p_A(1-p_B)}{p_A + p_B - p_Ap_B}\tag{1}\\ &= P(\text{A wins}),\\ P((T,H)\mid (T,T)^c) &= \frac{P(T,H)}{P(\text{at least one of A and B tosses a Head})}\\ &= \frac{p_B(1-p_A)}{p_A + p_B - p_Ap_B}\tag{2}\\ &= P(\text{B wins}),\\ P((H,H)\mid (T,T)^c) &= \frac{P(H,H)}{P(\text{at least one of A and B tosses a Head})}\\ &= \frac{p_Ap_B}{p_A + p_B - p_Ap_B}\tag{3}\\ &= P(\text{game is tied}). \end{align} But, as whuber cogently asked earlier, why am I claiming that the conditional probabilities computed in $(1), (2)$, and $(3)$ (note that they add up to $1$) are respectively equal to the unconditional probabilities of A winning, B winning, and the game being tied? Well, the game ends on the trial being considered and we are just looking at the reduced sample space $\Omega^\prime = \{(H,T), (T,H), (H,H)\}$ and the conditional probability measure that assigns the probabilities given by $(1), (2)$, and $(3)$ to these outcomes.

Alternatively, consider the mutually exclusive events $C= \{H,T)\}$ and $D = \{(T,H),(H,H)\}$. It is a standard result in probability theory that on a sequence of independent trials, the (unconditional) probability that $C$ occurs before $D$ does (and so A wins) is given by \begin{align}P(\text{C occurs before D}) &= \frac{P(C)}{P(C)+P(D)}\\ &= \frac{p_A(1-p_B)}{p_A(1-p_B) + p_B((1-p_A) + p_Ap_B}\\ &= \frac{p_A(1-p_B)}{p_A + p_B - p_Ap_B} \end{align} which is the same value as in $(1)$. The careful but incredulous reader is invited to work out the other cases similarly to verify that right sides of $(2)$ and $(3)$ do indeed give the respective unconditional probabilities of B winning, and of the game ending in a tie.

Just noticed your exposition, my new comment above applies. I really do not really wish to pursue this topic further given my disclosed financial interest in a provisional game patent. — AJKOER, Oct 28 '20 at 12:29

AJKOER · Answer 3 · 2020-10-21T21:55:41.803

As I indicated in a comment above there is a cited path to the solution available here courtesy of the Math forum on Stack Exchange on the topic 'Difference between two independent geometric distribution.

However, I have also found another route that is worthy of mention as it may have more broad applications (that is, for other distributions than the Geometric). It is especially understandable for those acquainted with the Monte Carlo CDF inversion method to generate random deviates. The method basically consists of equating a uniform deviate on (0,1) to the CDF and solve (if possible directly or otherwise) to obtain X, the associated random deviate for the distribution of interest.

Note: I have constructed a spreadsheet with varying parameters values for ${p_1}$ and ${p_2}$ and feel confident that the methodology presented is sound in practice and theory, based on repeated runs lengths of a 1,000 pairs of generated pairs of uniform random deviates and the corresponding produces Geometric distribution (CDF reference here).

In the case of the Geometric, the math proceeds as follows:

$${U = CDF = 1 - (1-p)^X}$$

which can produce a pair of random Geometric deviates starting with two Uniform random deviates for two Geometric distributions with corresponding different parameters:

$${X1 = Ln(1-U1)/Ln(1-p_1)}$$

$${X2 = Ln(1-U2)/Ln(1-p_2)}$$

Note: for efficiency, and mathematical convenient latter, as both U1 and 1-U1 are uniformly distributed random deviates, one can replace 1-U1 with just U1. [EDIT] Although originally, I did not round the generated deviates to nearest integers, subsequent investigation working with integer values for the Geometric deviates, and returning a blank cell with ties (so the observed simulation statistics were absence incidents of draws), produced results actually closer to the expected theoretical value detailed below, albeit a little more variable from a varying reduced sample size (per ignoring draws).

So, how does this assist in the calculations of the respective probabilities involving the two Geometric variables? Quite directly actually by straight substitution:

$${ Pr[ X1 < X2] = Pr[ Ln(U1)/Ln(1-p_1) < Ln(U2)/Ln(1-p_2)]}$$

Taking the exponential of both sides of the inequality implies:

$${ Pr[ Exp(Ln(U1)*Exp(-Ln(1-p_1)) < Exp(Ln(U2))*Exp(-Ln(1-p_2) ]}$$

Or:

$${ Pr[ U1/(1-p_1) < U2/(1-p_2)]}$$

And, most importantly, as we can now introduce the topic of the Uniform Ratio Distribution also discussed in Wikipedia:

$${ Pr[ (Z =) U1/U2 < (1-p_1)/(1-p_2)]}$$

is provided directly and facilely by CDF of the Uniform Ratio Distribution in the cases where ${(1-p_1)/(1-p_2) < 1}$ (or greater), which equivalently implies corresponding values when say ${p_1 > p_2}$ (or ${p_1 < p_2}$).

Based on my simulation, I observed that the sampling observed value became much closer to the theoretical value postulated by the Uniform Ratio distribution, on a starting simulation of 1,000 random deviate pairs, as the absolute difference between ${p_1}$ and ${p_2}$ increased.

[EDIT] To end confusion as to my claims and provide verification of reported accuracy, I now provide a link to a Published Google Spreadsheet. It updates every 5 minutes from my master copy and is read-only, but fully populated with 1,000 rows. I can periodically recalculate to provide alternate simulations results. [EDIT][EDIT]Link removed: Spreadsheet has since evolved into the basis of a Casino Game Provisional Patent (I may sometime in the future post more details upon request).

This makes no sense at all, because in replacing a discrete distribution by a continuous one you *necessarily* reduce the chance of equality from a positive value to zero, which obviously is incorrect. — whuber, Oct 09 '20 at 14:07
Per Edit/ further work, surprisingly truncating to integers the deviates produced better performance, for discrete geometric simulations. Frequency Pr [ X1 > X2] affirmed and supports Uniform Ratio distribution application. NO doubts, simple and apparently works nicely (another worksheet) where I compared for generated continuous exponentials compared to the observed probability to that obtained for exponential difference, X1 - X2 > 0, which is Laplace distributed with prediction probabilities in agreement with associated uniform ratio expectation rending equal performance. – AJKOER 6 mins ago — AJKOER, Oct 09 '20 at 20:29