7

Suppose that I have $X_1,X_2,X_3,...X_n$ iid random variables from a Poisson distribution of parameter $\lambda$. Given that $X_1 +X_2+X_3 +...+X_n = t$, what is the probability that exactly $k$ of $X_1,X_2,X_3,...X_n$ are zero?

--

My approach: I started out by considering the joint probability mass function where $X_1 +X_2+X_3 +...+X_n = t$ and $k$ of $X_1,X_2,X_3,...X_n$ is zero but I do not know how to proceed from here. If I use a binomial model to calculate the probability of having k number of zeros, I do not know how to impose the constraint on the sum of $X_n$.

P.Windridge
  • 2,075
  • 12
  • 13
The Yellow
  • 73
  • 3

2 Answers2

3

$Y = X_1+X_2+\cdots + X_n$ is a Poisson random variable with parameter $n\lambda$. So you can write down an expression for $P(Y=t)$.

There are $\binom{n}{k}$ choices for a set of $k$ variables that are zero. Pick a specific set. Then, the sum of the complementary set of variables is a Poisson random variable $Z$ with parameter $(n-k)\lambda$, and $Z$ is independent of the chosen $k$ variables. So, you can use independence to write down expressions for

$P(Z=t)$,

$P($chosen $k$ variables are $0)$,

$P($chosen $k$ are zero AND $Z=t) = P($chosen $k$ are zero AND $Y=t)$.

Can you take it from here? There shouldn't be any binomial distributions involved...

Dilip Sarwate
  • 41,202
  • 4
  • 94
  • 200
  • What if $Z$ has zeros in it? – The Yellow Oct 04 '17 at 21:26
  • What about it? $Z$ is the sum of Poisson random variables and if some of them are 0, the rest must be larger if the sum is to be $t$. – Dilip Sarwate Oct 04 '17 at 22:22
  • I only want k variables that are zero. "Poisson random variable Z with parameter (n−k)λ" -> it is possible that the complementary set of variables could contain zeros as well, in which case I have more than k zeros for $X_1,X_2,X_3,...X_n$ – The Yellow Oct 04 '17 at 23:05
  • P(k zero|t) = p(t|k zero) * P(k zero)/P(t) .... This strategy works for a *specific* set of $k$ variables being equal to zero. $$P(\text{specific k zero} \vert t) = \frac{ \left(\frac{(n-k)\lambda^t e^{-(n-k)\lambda t}}{t!}\right) \left( \frac{\lambda^0 e^{-\lambda t}}{0!} \right)^k }{\left(\frac{n\lambda^t e^{-n\lambda t}}{t!}\right)} = \left( \frac{n-k}{n} \right)^t$$ However, what if we want to know *any set* of $k$ variables being equal to zero? This problem seems more difficult. (I am not sure whether this *any* instead of *specific* is the question) – Sextus Empiricus Oct 05 '17 at 02:14
  • 1
    This question is about having k variables equal to zero. It is not a specific set of k variable. The question that I have regarding the above answer is that what if the $X_i$s assumed to be non-zero take on a zero value. More specifically, in "$P($chosen $k$ are zero AND $Y=t)$.", what if $X_i$ in $Y$ takes on a zero value, in which case I would have more than k zeros in total. – The Yellow Oct 05 '17 at 04:00
  • @DilipSarwate - in your approach, $Z$ should be something like the sum of Poisson random variables each *conditioned to be non-zero*? – P.Windridge Oct 05 '17 at 21:05
2

Let $Y:=X_1+\ldots+X_n$. Observe that the distribution of $(X_1,\ldots,X_n)$ conditional on $Y = t$ is multinomial (exercise). This gives a conceptually easier way to think about the problem- you have $n$ boxes and throw $t$ balls in them at random. What is the probability that $k$ are empty?

Well, first of all, there are $n^t$ ways of throwing the $t$ balls in the $n$ boxes with no restrictions.

Now it gets a little more complicated, even though we are just counting stuff. There are $\binom{n}{k}$ ways of choosing the $k$ boxes that stay empty. We are then left with $t$ balls to throw into the $n-k$ remaining boxes, such that each box is not empty. You can do this by inclusion/exclusion, just like in the proof of the Stirling number https://math.stackexchange.com/questions/550256/stirling-numbers-of-second-type .

Combining these ingredients gives the desired probability, $$ \frac{1}{n^t}\binom{n}{k}\sum_{j=0}^{n-k} (-1)^{n-k-j} \binom{n-k}{j} j^t, $$ $t \ge n-k$, $n \ge k$.

Note that $\lambda$ does not feature in the answer.


Out of interest and as a quick exercise I coded this (borrowing a Stirling number function I found with Google) to see what the answer looks like:

##-- Stirling numbers of the 2nd kind
##-- (Abramowitz/Stegun: 24,1,4 (p. 824-5 ; Table 24.4, p.835)

##> S^{(m)}_n = number of ways of partitioning a set of $n$ elements into $m$
##> non-empty subsets

Stirling2 <- function(n,m)
{
  ## Purpose:  Stirling Numbers of the 2-nd kind
  ##        S^{(m)}_n = number of ways of partitioning a set of
  ##                      $n$ elements into $m$ non-empty subsets
  ## Author: Martin Maechler, Date:  May 28 1992, 23:42
  ## ----------------------------------------------------------------
  ## Abramowitz/Stegun: 24,1,4 (p. 824-5 ; Table 24.4, p.835)
  ## Closed Form : p.824 "C."
  ## ----------------------------------------------------------------

  if (0 > m || m > n) stop("'m' must be in 0..n !")
  k <- 0:m
  sig <- rep(c(1,-1)*(-1)^m, length= m+1)# 1 for m=0; -1 1 (m=1)
  ## The following gives rounding errors for (25,5) :
  ## r <- sum( sig * k^n /(gamma(k+1)*gamma(m+1-k)) )
  ga <- gamma(k+1)
  round(sum( sig * k^n /(ga * rev(ga))))
}


pmf<-function(n,t,k) {
  if (t >= (n-k) & n >= k) {
    (choose(n,k) * factorial(n-k) * Stirling2(t,n-k) )/(n^t)
  } else {
    0
  }
}


lambda <- 1
n <- 10
reps <- 500000
set.seed(2017)
X <- matrix(ncol=n,nrow=reps,data=rpois(n*reps,lambda))

K <- apply(X, 1,function(x){sum(x == 0)})
hist(K)

# restrict only to those that sum to t
Y<-rowSums(X)


t<-8
G<- (Y == t)
sum(G)

k <- 5
#head(X[which(K==k),])
#head(Y[which(K==k)])
#head(X[G,])
#head(Y[G])

posskvalues <- (n-t):n
nk <- length(posskvalues)
empP <- numeric(nk)
thP <- numeric(nk)

for(i in 1:nk) {
  k <- posskvalues[i]
# sum(K[G] == k)
  empP[i] <- sum(K[G] == k)/sum(G)
  thP[i] <- pmf(n,t,k)
}

plot(posskvalues,empP,main=paste("n=",n,", t=",t))
points(posskvalues,thP,pch="x")

enter image description here

P.Windridge
  • 2,075
  • 12
  • 13