4

Suppose I have N probabilities $(p_1, p_2,...,p_N)$ that represent the chance that each that a corresponding test was passed. How do I apply the Bernoulli distribution to determine the expected number of passes?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • It's not completely clear how success is being "estimated" here. – Glen_b Aug 26 '15 at 22:50
  • I'll try to be clearer. I have a number of locations. I apply a test in each location. However, I only have a probability that the test passed. I want to estimate the number of passed tests, with some confidence interval. – Meghan Stephens Aug 27 '15 at 00:45
  • Thanks, yes, that's clearer now. I have attempted an edit to your question to perhaps make it clear in the question itself, but you may want to edit it further. Is this for a class? – Glen_b Aug 27 '15 at 01:47
  • Ha - no its not a homework question. It is for some work I am doing and I have significantly reduced the problem so that it is a simple question. – Meghan Stephens Aug 27 '15 at 05:37
  • see also section 3 of http://stats.stackexchange.com/questions/160458/estimate-accuracy-of-an-estimation-on-poisson-binomial-distribution/164746#164746 –  Aug 27 '15 at 08:12

2 Answers2

9

If $p_i$ is the probability that test $i$ passed, then we can model each test as a Bernoulli random variable:

$$X_i \sim Ber(p_i)$$

The expected number of passed tests is equal to:

$$E[\text{#passed}]=E\left(\sum_{i=1}^N X_i\right)=\sum_{i=1}^N E(X_i)=\sum_{i=1}^N p_i$$

This is due to the linearity of expectation and the fact that a $Ber(p)$ random variable has expected value $p$.


Per your response to glen_b, the distribution of the total number of passed tests will have a Poisson Binomial distribution.

1

If you add up $n$ Bernouilli random variables with a different success probabilities then the sum has a Binomial Distribution of Poisson.

If the success probabilities are $p_i$ then the mean of the Binomial of Poisson is $n \bar{p}$ where $\bar{p}=\frac{\sum_i p_i}{n}$ and the variance of the Binomial distribution of Poisson is $n \bar{p} (1-\bar{p}) - n \times var(p)$ (Note: for $var(p)$ there must be $n$ in the denominator!).

Note that if $var(p)$ is relatively small then the variance reduces to a binomial variance, which was expected because a small $var(p)$ means that the success probabilities of the Bernouillis are more or less equal.

I have some code to simulate this:

# Function that simulates Poisson Binomial random variable
# 
#   parameter 'ps' contains the success probabilities of the Bernouilli's to add up
#   parameter n.sim is the number of simulations
#
#   The return value is a list containing
#      - the simulated mean
#      - the simulated variance
#      - the 'true' mean of Poisson Binomial namely  n x average(ps)
#      - the true variance of Poisson Binomial namely n x average(ps) x (1-average(ps)) - n var(ps)

simulate.Poisson_Binomial<-function(ps, n.sim=100000) {

  sum.all<-vector(mode="numeric", length=n.sim)
  for ( i in 1:n.sim ) {
    # generate the random outcome for each Bernouilli
    random.outcome<-( runif(n=length(ps)) <= ps )

    # count the number of successes
    sum.all[i]<-sum(random.outcome)
  }

  ret<-list(sim.mean=mean(sum.all), 
            sim.var=var(sum.all),
            PoisBin.mean=length(ps)*mean(ps), 
            PoisBin.var=length(ps)*mean(ps)*(1-mean(ps))-(length(ps)-1)*var(ps))

  return(ret)
}


# Generate 50 Bernouilli success probabilities
set.seed(1)
N<-50
ps<-runif(n=N)

# do the simulation
simulate.Poisson_Binomial(ps=ps, n.sim=5e5)

In the return of the R-function there is (length(ps)-1)*var(ps) this is because the R-function var() has (n-1) in the denominator. so $-n \times var(p)$ in the formula above should be 'translated' to -length(ps) * ( var(ps) * (length(ps)-1)/length(ps) which becomes - var(ps) * (length(ps)-1)

See also this Intuitive explanation for dividing by $n-1$ when calculating standard deviation?