Geometric distribution with random, varying success probability

Question

I'm really sorry if this question is too basic, but I've been looking for a while and haven't been able to find a convincing response. My statistics background is rather poor.

Geometric distribution is defined as the probability distribution of the number X of Bernoulli trials needed to get one success (or the number of failures before the first success,) given a constant success probability $p$ for every trial.

Now suppose that $p_i$ is an iid random variable, and a different realization of it for every single Bernoulli trial (as opposed to one for each Geometric or a constant value,) following a continuous distribution defined in the interval $[0,1]$ with $E[p_i]=\bar{p}$ known.

My main question is: Is X still distributed Geometric? If so, is it right to say $X\sim Geom(\bar{p})$?
My intuition is that the problem can be reduced to the simpler (i.e. constant $p$) problem since if $Y\sim Bernoulli(p_i)$, then $E(Y)=E(E(Y|p_i))=E(p_i)=\bar{p}$ by the law of total expectation. From here I could show by induction that $P(X=k)=P(Z=k)$ for every $k\in\mathbb{N}$ and $Z\sim Geom(\bar{p})$. But my probability skills won't let me confirm or reject this. Is this reasoning right?
A complementary question is whether this is a well-known problem (or the solution uses a well-known theorem,) with a name. I suspect the answer may sound obvious to someone with a better statistics background than me.
At the end of this post, I add sample code I used to simulate the problem. If X is not Geometric, a final question would be: what is going on with my simulations that can't rule it out? Have I faced a special case?

There is a related question here, but the $p_i$ are deterministic, making it a different problem, I think.

Thanks in advance

Update

Thanks to whuber's comment and Glen_b's question, I realize I might have explained myself a bit ambiguously.

What I'm saying is the $p_i$ are redrawn for each independent Bernoulli trial, not just for each realization of the Geometric variable (which could imply several trials with the same success probability.)

The Beta-binomial distribution Wikipedia article suggests (although the language is ambiguous to me,) that the success probability is the same for all the n trials. A mixture of geometric distributions would also suggest a consistent success probability from the first trial to the first success.

I also run simulations. I know this does not prove anything, but for what it's worth, I report them here:

In pseudo-code:

Let $p_i\sim G$ //some distribution

Let $B_i\sim Bernoulli(p_i)$ //a new p_i for every B_i

my_clumsy_generator:

$n\leftarrow0$

do:

$t\leftarrow$drawn from $B_i$

$n\leftarrow n+1$

while $t\neq1$

//Is n distributed Geometric?

Sample Wolfram Mathematica code:

f[] := RandomVariate[BetaDistribution[3, 2]]
g[] := With[{},
    n = 0;
    While[RandomVariate[BernoulliDistribution[f[]]] == 0, n = n + 1];
    n];

ress = Table[g[], {i, 1000000}];

pp = p /. FindDistributionParameters[ress, GeometricDistribution[p]]
DistributionFitTest[ress, GeometricDistribution[pp]]

An example run of this code gave pp=0.600311 and a p-value of 0.906769. Below is a resulting histogram for 1,000,000 values:

I have tried a number of distributions (categorical, uniform, with different parameters) besides the beta shown
$pp$ turns out to be consistently close to Mean[pp] and the p-value of the distribution fit test is always high, even at n=1,000,000 (I haven't rejected a single null hypothesis at 95% confidence)

Even in the Bernoulli case, the analogous results do not hold. For instance, the [Beta-binomial distribution](https://en.wikipedia.org/wiki/Beta-binomial_distribution) arises when the $p_i$ are drawn independently from a Beta distribution. The Beta-Binomial is "overdispersed" compared to the Binomial: the variation in the $p_i$ creates more variation in the results than could be accounted for by sampling from a fixed Bernoullli. — whuber, Mar 09 '18 at 21:51
@whuber would it hold under specific additional restrictions? (e.g. specific $p_i$ distributions/parametrizations?) I run some simulations and it seemed to hold for specific cases — Rafael, Mar 09 '18 at 22:10
The mean of a Geometric$(p)$ distribution is $\mu=1/p$ and its variance is $(1/p)^2-1/p=\mu^2-\mu.$ When you sum a bunch of these distributions the mean and variance of the sum are the sums of the means and variances, respectively. A little algebra should convince you that this mean and variance cannot stand in the same algebraic relationship unless all the $p_i$ are equal. Therefore, although when the $p_i$ don't vary much the resulting distribution may sort of *look* geometric, it won't be. — whuber, Mar 09 '18 at 22:28
@whuber But is the beta-binomial drawing a $p_i$ for every trial or for every n trials? The article says at n=1 it is reduced to the Bernoulli distribution... — Rafael, Mar 12 '18 at 13:46
Think of it as a two-stage observational process. To make a single success/failure observation, first draw a value $p$ from a Beta$(\alpha,\beta)$ distribution. Fill an urn with balls in which a proportion $p$ are labeled "success" and the remainder "failure." Mix thoroughly and draw out one ball. For $n \gt 1,$ repeat from the very beginning (so that $p$ varies with each draw). At the end of $n$ such trials, count the number of successes observed: that's the Beta-Binomial outcome. — whuber, Mar 12 '18 at 14:15
@whuber Thanks, I really appreciate your help! Pitty that I can't enter the chat yet. The question then is why the results of my simulations aren't distinguishable from geometric (I can confirm that the $p_i$ are different for every try.) — Rafael, Mar 12 '18 at 14:21
What method do you use to compare your results to a geometric distribution? — whuber, Mar 12 '18 at 14:28
@whuber Pearson's chi-squared, which seems to be the only one available for this case in Mathematica (discrete distribution) — Rafael, Mar 12 '18 at 14:36
Yes, but *how* did you do the comparison? Your remark about *Mathematica* is suspect, given the [large collection of distributions](http://reference.wolfram.com/language/guide/DiscreteUnivariateDistributions.html) and tests it supports. — whuber, Mar 12 '18 at 15:21
@whuber `DistributionFitTest[ress, GeometricDistribution[pp]]` — Rafael, Mar 12 '18 at 15:24
Thanks: I hadn't noticed the update, which clearly shows what you are doing. — whuber, Mar 12 '18 at 15:33
@whuber update: I just tried the following: 1) 10,000 binomials with a fixed n=10 and random p for each _binomial_, the resulting array fits beta-binomial, not binomial; 2) 10,000 sums of 10 bernoullis and a random p for each _bernoulli_, the resulting array fits binomial, not beta-binomial, which is kind of consistent with my _reduction_ idea — Rafael, Mar 12 '18 at 15:34

Glen_b · Answer 1 · 2018-03-10T03:20:16.617

My main question is: Is X still distributed Geometric?

No.

Indeed, the mean of the resulting distribution is not the same as the mean of a geometric with parameter equal to $E(p)$; this is because the mean of the geometric is not linear in $p$. The mean of the resulting distribution would correspond instead to the harmonic mean of the distribution of $p$ (i.e. to the arithmetic mean of the distribution of the means of the geometric distributions).

But my probability skills won't let me confirm or reject this.

Simulation is sufficient to see it. Try simulating say 100000 values from either a geom(1/3) or a geom(1/9) with equal chance and see that although the mean is equivalent to a geom(1/6), the proportion of 1's (or of 0's if you index your geometric from 0) is wrong, and the sd is wrong.

Instead of the log-probabilities decreasing geometrically, they initially decrease at a similar rate to the one with the smaller mean then bend around and eventually in the far tail they decrease at a similar rate to the one with the larger mean.

However, it's relatively easy to do the algebraic calculations for the first point (or for the variance) for a simple case like that.

A complementary question is whether this is a well-known problem (or the solution uses a well-known theorem,) with a name. I suspect the answer may sound obvious to someone with a better statistics background than me.

Sure, people do this kind of thing with distributions; it has all manner of uses. Having a distribution on a parameter is a form of mixture distribution.

https://en.wikipedia.org/wiki/Mixture_distribution

Thanks for taking your time! It seems I explained the problem wrong. Please see my update. — Rafael, Mar 12 '18 at 13:44

score 3 · Accepted Answer · edited Mar 12 '18 at 16:38

Your algorithm independently draws a sequence of probabilities from random variables $P_i$ with distributions $F_i.$ Since these distributions must be bounded between $0$ and $1$ they all have expectations $p_i.$ It then observes a parallel sequence of independent Bernoulli$(P_i)$ variates $X_i,$ stopping the first time an outcome of $1$ is observed, and returns the number of zeros encountered along the way. Let's call this random variable $Y.$

Let $k$ be a possible value of $Y.$ Consider the survival function of $Y,$ given by the chance that $Y \ge k.$ Because (conditional on $(P)=(P_1,P_2,\ldots)$) the $X_i$ are independent, the chance that $Y$ equals or exceeds $k$ for any given set of probabilities is

$$\eqalign{ \Pr(Y \ge k \mid (P)) &= \Pr(X_1=0 \mid P_1) \Pr(X_2=0\mid P_2) \cdots \Pr(X_k=0\mid P_k)\\ &=(1-P_1)(1-P_2)\cdots(1-P_k). }$$

Because the $P_i$ are independent, the expectation (taken over the process $(P)$) is

$$\Pr(Y \ge k) = \mathbb{E}\left[\Pr\left(Y\ge k\mid (P)\right)\right] = \prod_{i=1}^k (1 - \mathbb{E}[P_i])=\prod_{i=1}^n (1-p_i).$$

Suppose now that all the expected probabilities $p_i$ are the same, equal to some common probability $\bar p.$ The preceding simplifies to

$$\Pr(Y \ge k) = (1-\bar p)^k.$$

That's a Geometric distribution.

It may be instructive to contrast this procedure with the very similar one in which you draw $\bar p$ at the outset of sampling the $X_i,$ effectively generating one realization of a Geometric distribution of parameter $\bar p,$ and then repeat this for new, independent values of $\bar p.$ The result is not a sample from a Geometric distribution, as you can see by considering this modification of the code:

g[] := RandomVariate[GeometricDistribution[f[]]]

Let's run it:

SeedRandom[17];
ress = Table[g[], {i, 1000}];
Histogram[ress]

The tail is too long to be Geometric.

Replacing the While loop in g by a direct generation of $Y$ via GeometricDistribution was essential because sooner or later f[] will produce a very tiny value of p and the loop will go on for an extremely long time before a success ($1$) is observed. Using a loop becomes too inefficient--but it also helps us understand better how the modified process differs from the original one. In the original it's unlikely that the loop will go on for very long (assuming the values of f aren't concentrated near $0$), because different probabilities are generated at each step, thereby assuring it won't be caught trying to generate an extremely unlikely success.

Rookie mistake, $\mathbb{E}[\prod{f}]=\prod\mathbb{E[f]}$ is called _multiplication rule of independent events_. The question about the name of the equivalence remains, though: does the property Bernoulli($P_i$)$\equiv$Bernoulli($\bar{p}$) have a common name? — Rafael, Mar 12 '18 at 16:29
It has no name I know of because it comes down to the fact that the *only* possible distributions with values in $\{0,1\}$ must be Bernoulli distributions, essentially by definition of a Bernoulli variable. Since the left hand side of this property has values only in $\{0,1\},$ it *must* be some Bernoulli distribution. — whuber, Mar 12 '18 at 16:41

score 0 · Answer 3 · answered Jul 05 '18 at 17:49

The most common continuous distribution with the random variable with support between 0 and 1 is the beta distribution. When the "p" in the geometric distribution is a beta variable, you will end up with a geometric mixture distribution called "beta-geometric" distribution whose parameters can easily be derived. However, p does not have to stick in the 0-1 range if it is a variable. Thus if p were to take any other distribution-say exponential, you will obtain another geometric mixture distribution which we call exponential-geometric mixture distribution. The parameters of this distribution can also be easily derived. For reference google University of Nairobi repository for many of these mixture distributions

Geometric distribution with random, varying success probability

Update

3 Answers3