2

I have a multinomial distribution with four outcomes, with a pdf:

$$p(x_1,x_2,x_3,x_4)=\frac{n!}{x_1!x_2!x_3!x_4!}p_1^{x_1}p_2^{x_2}p_3^{x_3}p_4^{x_4}, \sum_{i=1}^4x_i=n, \sum_{i=1}^4p_i=1$$

The probabilities are related to the single parameter $0\le\theta\le 1$
\begin{align*} p_1&=\frac{1}{2}+\frac{1}{4}\theta\\ p_2&=\frac{1}{4}-\frac{1}{4}\theta\\ p_3&=\frac{1}{4}-\frac{1}{4}\theta\\ p_4&=\frac{1}{4}\theta \end{align*}

If we have an observation $x=(x_1,x_2,x_3,x_4)$, the log-likelihood is:

$$l(\theta)=x_1\log(2+\theta)+(x_2+x_3)\log(1-\theta)+x_4\log(\theta)+c$$

Also, I will suppose $x=(125,18,20,34).$

1) I start by finding the MLE of $\theta$ by simply maximizing its log-likelihood. I took the derivative of the log-likelihood with respect to $\theta$ and set it equal to zero: \begin{align*} \frac{x_1}{2+\theta}-\frac{x_2+x_3}{1-\theta}+\frac{x_4}{\theta}&=0 \\ \frac{125}{2+\theta}-\frac{38}{1-\theta}+\frac{34}{\theta}&=0 \\ 197\theta^2-15\theta-68 &=0 \end{align*} Using the quadratic formula I get: $\theta \in \{0.6268, -0.5507\}$. $\theta$ can't be negative here, and the values $\theta\to 0$ and $\theta\to 1$ do not approach minima, so the MLE of $\theta$ is $\hat\theta=0.6268$.

I do not know if I was able to successfully drive the MLE by maximizing its log-likelihood. If it is correct, I also do not know if it would be appropriate to leave it in this "plus or minus" form.

2) Now I would like to compare what I got in (1) with the EM algorithm. To do so, I will consider a multinomial with five classes formed from the original multinomial by splitting the first class into two with probabilities $\frac{1}{2}$ and $\frac{\theta}{4}$. The original variable $x_1$ is split into $x_1=x_{11}+x_{12}$. Now, we have a MLE of $\theta$ by considering $x_{12}+x_4$ to be a realization of a binomial with $n=x_{12}+x_4+x_2+x_3$ and $p=\theta$. However, we do not know $x_{12}$, and the complete data log-likelihood is:

$$l_c(\theta)=(x_{12}+x_4)\log(\theta)+(x_2+x_3)\log(1-\theta)$$

I would like to develop an EM algorithm for estimating $\theta$. I am told that I should be able to combine the E-step and M-step together, i.e., $\hat\theta^{(t+1)}$ can be expressed in terms of $\hat\theta^{(t)}$.

I am very stuck with conceptualizing how to approach this problem. I have read about the EM algorithm (including the famous Nature article (Numerical example to understand Expectation-Maximization)).

All I can say right now is that the binomial must follow:

${n \choose k}p^k(1-p)^{(n-k)}$

and $n=x_{12}+72$

But I am very lost at what I would do for the expectation and maximization steps! In fact, I want to implement this in R, and all I can get out is:

x = c(0, 18, 20, 34)  
t=.5  
l = (x12+x[4])*log(t)+(x[2]+x[3])*log(1-t)  

And already I am stuck because I do not have x12 and I am guessing that I should start t (theta) at a default value of 0.5. Any advice on how to start the algorithm, and how to perform E and M steps would really help me!

Green Stone
  • 145
  • 1
  • 2
  • 11
  • for your part (1)check on second derivative which parameter gives maximum. – Hemant Rupani Apr 21 '15 at 06:47
  • @Xi'an I'm not sure what you mean? Do you mean the solution should be available anyway since it is classic? Thank you. – Green Stone Apr 21 '15 at 13:08
  • 1
    (1) is this question related to a course, homework, in any way? It would then deserve the self-study tag. (2) This example is on page 2 of the original EM paper by Dempster, Laird, and Rubin (1977). It is also entirely processed as Example 5.21 in [our book with George Casella](http://www.amazon.com/gp/product/1441919392?ie=UTF8&tag=chrprobboo-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=1441919392). So I wonder what else you need. – Xi'an Apr 21 '15 at 13:24
  • About the direct maximisation of the likelihood: since you have two possible values for $\theta$, what prevents you from comparing the numerical value of the likelihoods at those two values? – Xi'an Apr 21 '15 at 14:33
  • @Xi'an Thank you for the helpful resources! I was able to get the EM algorithm implemented in R, and got the same results of the Dempster et al. paper of 0.62682. However, the answer I got for the first part of my problem was 1.506 and -0.0588. I don't know how I can compare my part (1) with part (2). – Green Stone Apr 21 '15 at 21:06
  • @HemantRupani Thank you! I know have tried to take the second derivative now, and got $\frac{-125}{(2+\theta)^2} + \frac{38}{(\theta-1)^2} - \frac{34}{\theta^2}$. However, this led to a quadratic equation of $159\theta^4+68\theta^3-265\theta^2-288\theta-16=0$ and then \theta values of 1.506 and -0.0588. I am not sure if my approach is now accurate? – Green Stone Apr 21 '15 at 21:32
  • 1
    @Joseph I corrected MLE in your Q. Just review it :) – Hemant Rupani Apr 21 '15 at 21:47
  • 1
    @Hemant I fixed your correction. – whuber Apr 21 '15 at 22:28
  • @HemantRupani: Thank you very much for your help! I just want to ask one thing... Why did you do -(x1+x2)/(1-theta) instead of +(x1+x2)/(2-theta)? – Green Stone Apr 22 '15 at 13:49
  • http://en.m.wikipedia.org/wiki/Chain_rule – Hemant Rupani Apr 22 '15 at 14:07

1 Answers1

4

Extracted from our book, Monte Carlo Statistical Methods (except for the sentence in italics):

A classic (perhaps overused) example of the EM algorithm is the genetics problem (see Rao (1973), Dempster, Laird and Rubin (1977)), where observations $(x_1,x_2,x_3,x_4)$ are gathered from the multinomial distribution $$ \mathfrak{M}\left( n;{1\over 2} + {\theta \over 4}, {1\over 4}(1-\theta), {1\over 4}(1-\theta), {\theta \over 4}\right). $$ Estimation is easier if the $x_1$ cell is split into two cells, so we create the augmented model$$ (z_1,z_2,x_2,x_3,x_4) \sim \mathfrak{M}\left( n;{1\over 2}, {\theta \over 4}, {1\over 4}(1-\theta), {1\over 4}(1-\theta), {\theta \over 4}\right), $$ with $x_1=z_1+z_2$. [This means that $(Z_1,Z_2)$ is a Multinomial $\mathfrak{M}\left( x_1; {1\over 2}, {\theta \over 4}\right)$, therefore that$$\mathbb{E}[Z_2|X_1=x_1]=\dfrac{\theta/4}{1/2+\theta/4}x_1=\dfrac{\theta}{2+\theta}x_1\,.\Big]$$ The complete-data likelihood function is then simply $\theta^{z_2+x_4} (1-\theta)^{x_2+x_3}$, as opposed to the observed-data likelihood function $(2+\theta)^{x_1}\theta^{x_4} (1-\theta)^{x_2+x_3}$. The expected complete log-likelihood function is \begin{eqnarray*} &&\mathbb{E}_{\theta_0} [ (Z_2+x_4) \log\theta + (x_2+x_3) \log (1-\theta) ] \\ &&\qquad = \left( {\theta_0 \over 2+\theta_0}x_1 + x_4 \right) \log\theta + (x_2+x_3) \log (1-\theta) \;, \end{eqnarray*} which can easily be maximized in $\theta$, leading to $$ \hat\theta_1 = \frac{\displaystyle{{\theta_0\,x_1\over 2+\theta_0}} + x_4}{\displaystyle{{\theta_0\,x_1\over 2+\theta_0}} +x_2+x_3+x_4} \;. $$

Xi'an
  • 90,397
  • 9
  • 157
  • 575