1

I am looking for the maximum likelihood estimator (MLE) for the Poisson-binomial distribution. I understand the derivation of the MLE for a Poisson distribution and a binomial distribution, but I am unable to derive the MLE equation for the Poisson-binomial distribution.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • Check the discussion in comments: https://stats.stackexchange.com/q/237737/35989 – Tim Feb 21 '18 at 11:34

1 Answers1

3

Consider a Poisson-binomial distribution with probability parameter $\boldsymbol{\theta} \equiv (\theta_1, ..., \theta_m) \in [0,1]^m$. The probability density function (PDF) for this distribution is given by:

$$p(k | \boldsymbol{\theta}) = \sum_{\mathcal{A} \in \mathfrak{P}_k(m)} \Bigg( \prod_{j \in \mathcal{A}} \theta_j \Bigg) \Bigg( \prod_{j \notin \mathcal{A}} (1-\theta_j) \Bigg),$$

where $\mathfrak{P}_k(m)$ is the class of all subsets of $k$ labels from $\{ 1, 2, ..., m \}$. The sampling density is clearly invariant to permutations of the probability vector $\boldsymbol{\theta}$, and so this parameter is unidentifiable in the distribution. The minimal sufficient parameter for this distribution is the vector of ordered probability values $\boldsymbol{\theta}_*= (\theta_{(1)} \geqslant ... \geqslant \theta_{(m)})$. This means that the MLE for $\boldsymbol{\theta}$ cannot be estimated uniquely, and for any MLE vector, any permutation of that vector will also be an MLE.


Setting aside this identifiability issue, we can obtain equations for the MLE. Letting $\boldsymbol{\theta}_{-a} \in [0,1]^{m-1}$ be the probability parameter excluding the element $\theta_a$, we have the recursive formula:

$$p(k | \boldsymbol{\theta}) = \theta_a p(k-1 | \boldsymbol{\theta}_{-a}) + (1-\theta_a) p(k | \boldsymbol{\theta}_{-a}).$$

We therefore have the useful preliminary result:

$$\frac{\partial p}{\partial \theta_a} (k | \boldsymbol{\theta}) = p(k-1 | \boldsymbol{\theta}_{-a}) - p(k | \boldsymbol{\theta}_{-a}).$$

Now, given the observed vector $\boldsymbol{k} \equiv (k_1, ..., k_n)$ taken from IID draws from the Poisson-binomial distribution, we have log-likelihood function $l_{\boldsymbol{k}} (\boldsymbol{\theta}) = \sum_{i=1}^n \ln p(k_i | \boldsymbol{\theta})$ which gives us the score:

$$\frac{\partial l_\boldsymbol{k}}{\partial \theta_a} (\boldsymbol{\theta}) = \sum_{i=1}^n \frac{\partial}{\partial \theta_a} \ln p(k_i | \boldsymbol{\theta}) = \sum_{i=1}^n \frac{p(k_i-1 | \boldsymbol{\theta}_{-a}) - p(k | \boldsymbol{\theta}_{-a})}{p(k_i | \boldsymbol{\theta})}.$$

The MLE occurs at any point $\hat{\boldsymbol{\theta}}$ satisfying:

$$\sum_{i=1}^n \frac{p(k_i-1 | \hat{\boldsymbol{\theta}}_{-a}) - p(k | \hat{\boldsymbol{\theta}}_{-a})}{p(k_i | \hat{\boldsymbol{\theta}})} = 0 \text{ } \text{ } \text{ } \text{ for all } a=1,2,...,m.$$

We have already noted that the MLE is invariant to permutations, and therefore we only require a "representative" of the class of permutations. We can obtain this by imposing the additional order constraint $\theta_{1} \geqslant ... \geqslant \theta_{m}$. For $n \geqslant m$ this set of $m$ score equations --plus the ordering constraint-- should yield a unique solution for $\hat{\boldsymbol{\theta}}$. This solution represents the class of $m!$ permutations of this vector that are all MLEs.

Ben
  • 91,027
  • 3
  • 150
  • 376
  • How can $n>=m$ ?, I didn't understand the use of $p(k|\hat{\theta}_{-a})$ in MLE equation as it will be always 0. Can you elaborate it?. So n<=m ? – GOURISH GOUDAR Mar 30 '18 at 14:47
  • 1
    When you are looking for an MLE for a distribution, you are generally dealing with a situation where you observe multiple IID draws from that distribution. In this exposition of the problem, each draw from the distribution has $m$ trials, and you are taking $n$ IID draws from the distribution. There is no restriction on $m$ and $n$ in this case; the number of draws from the distribution can exceed the number of trials in any given draw. – Ben Mar 30 '18 at 22:01
  • Maximisation of the likelihood is done using the usual calculus technique of finding the critical points. These occur at the points where the derivative of the log-likelihood is zero (the score equation). Because the parameter is a vector with length $m$, the score equation is a set of $m$ equations. In this particular case, the derivation of these derivatives show that the score equation involves terms of the form $p( k | \hat{\boldsymbol{\theta}}_{-a} )$. Those individual terms are not zero; they are probabilities of draws from a smaller Poisson-Binomial distribution. – Ben Mar 30 '18 at 22:05
  • So in the above equation MLE occur at $\hat\theta$, then what is exact meaning of $\hat\theta_{-a}$ ?? Can you explain with simple example? – GOURISH GOUDAR Apr 01 '18 at 12:11
  • As simple example I will take the input as $\theta=(0.6,0.5,0.4)$ ,n=3 and let $k=(1,2,3)$. I implemented the above equation in R. I didn't get an estimator(result was not equal to 0). I'm confused, let p=(0.5,0.4,0.3) may be my data and I found out P(X=k) where, k=1,2,3. What will be my parameter $\theta$ in this case? The MLE has to be given by the above equation right ? How it finds $\hat \theta$ in this example that maximizes P(X=k)?$\hat \theta$ should be found by equation itself, if not then I have to check for infinite combinations of probabilities [0,1] to produce 0 as output? – GOURISH GOUDAR Apr 01 '18 at 12:47
  • In this example, $\theta_{-1} = (0.5, 0.4)$, $\theta_{-2} = (0.6, 0.4)$ and $\theta_{-3} = (0.6, 0.5)$, so each of the distributions using these parameters would involve two categories (i.e., $m=2$). Your data would need to be a set of non-negative *integer* values $k_1, ..., k_n$. It is unclear what your purported data vector $p$ is referring to, since there cannot be non-integer data values. – Ben Apr 01 '18 at 22:29
  • Thanks for the explanation.I wanted to know how to choose $\theta$ values so that I can get an MLE for say k={1,2,3}. – GOURISH GOUDAR Apr 02 '18 at 01:11
  • As per my understanding the above equation only verifies MLE for input parameters $\theta $ but doesn't find one! So the equation just says the input parameters are MLE or not. The question is how can I find MLE ? It is impossible to choose random values as parameters and check if they satisfy MLE equation right! – GOURISH GOUDAR Apr 02 '18 at 06:56
  • Okay, I see what you're saying. Yes, the above equations give the first-order conditions (FOCs) for the MLE, but these do not have an explicit solution. In these cases you generally find the solution through iterative methods where you pick a starting value and iterate towards the solution to the FOCs (e.g., Newton-Raphson iteration, steepest descent iteration, etc.). It is probably worth doing a general calculus primer involving practice problems with no explicit solutions, to practice these methods. – Ben Apr 02 '18 at 07:55
  • If you would like more detail on using these procedures on the present maximisation problem, I suggest posing it as a new question on math.stackexchange.com. – Ben Apr 02 '18 at 07:57