4

Suppose that we have independent binomial variates with differing sizes and probabilities $X_i \sim \operatorname{Binomial}(n_i,p_i)$, and $Z = \sum_iX_i$ is the sum. I understand that $Z$ is distributed poisson-binomial, and approximations to this distribution have been discussed here before.

My question is what exact and approximate methods exist for estimating and generating variates from the distribution of $X$ given $Z$ (i.e. $\mathbb{P}(X=x | Z=z)$ )?

Update:
So I've made some progress here. Regarding sampling from the distribution, Chen (2000) notes the connection between this model and sampling with probability proportional to size via maximum entropy. Here is a bit of R code:

    library(sampling)
    # generate via rejection
    p <- c(.1, .05, .2)
    n <- c(10, 50, 30)
    z <- 10
    a <- rbinom(1000000, size=n[1], p=p[1])
    b <- rbinom(1000000, size=n[2], p=p[2])
    c <- rbinom(1000000, size=n[3], p=p[3])
    d <- cbind(a, b, c)
    s <- a+b+c
    d1 <- d[s==z, ]
    q <- p/(1-p)
    pr <- q/sum(q)
    
    #generate via PPS max-entropy
    cn <-matrix(NA, nrow=10000, ncol=3)
    for(ind in 1:10000){
      q <- p/(1-p)
      vals <- c(rep(q[1], n[1]), 
                rep(q[2], n[2]), 
                rep(q[3], n[3]))
      i <- as.logical(UPMEsfromq(UPMEqfromw(vals, z)))
      cn[ind, ] <- c(sum(vals[i]==q[1]), 
                    sum(vals[i]==q[2]), 
                    sum(vals[i]==q[3]))
      
    }
    colMeans(cn)
    colMeans(d1)
    
    round(prop.table(table(cn[, 2])), 4)
    round(prop.table(table(d1[, 2])), 4)

This seems to work, but I'm a little concerned that I am using $\pi_i = \frac{p_1}{1-p_i}$ for the inclusion probabilities, whereas in the paper we have $$\pi_i = \frac{p_1 R(n-1,\{i\}^c)}{(1-p_i)R(n,S)} \,.$$ I have not tried to calculate the $R$ functions yet to see if there is a difference, as it is somewhat complicated from a computational perspective.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
tomriddle
  • 41
  • 2
  • 1
    When you say $X$, you mean $X_i$, right? You want to draw from the probability distribution $P(X_i=x|Z=z)$? Also, it would help if you described the problem a little more. For instance, you could just use very crude Monte Carlo acceptance/rejection methods if the $n_i$ are pretty small and there are not too many $X_i$ in the sum. How big are the $n_i$ and how many terms in the sum defining $Z$? – Bill May 09 '13 at 15:34
  • 1
    The $n_i$s are between 1 and 200. I am interested in $X = (X_1,X_2,...,X_k)$, where $k$ is around 5 to 20. The $p_i$ for my problem are not necessarily small. – tomriddle May 09 '13 at 17:48

0 Answers0