2

I'm trying to understand how to reestimate parameters, as part of the EM algorithm. As a simple example, I'm trying to derive the reestimation formula for an exponential distribution. Here's the setup:

Suppose we have an observation sequence of positive real numbers $\{x_i: i=1,2,...n\}$. Each observation $x_i$ could have come from any one of a set of states. Let $s_i$ be the state of $i^{\text{th}}$ observation. Assume we know/have estimated the probabilities of each observation being in each of the states.

Now assume that in state 1, $x_i$ has an exponential distribution: $(1/t)e^{-x_i/t}$, where t is an unknown parameter. The goal is to find the reestimation formula for $t$.

I think the quantity we have to maximize is the following:

$\prod_i P(s_i=1)(1/t)e^{-x_i/t}$

$=(1/t^n)e^{-\sum_i x_i/t}\prod_i P(s_i=1)$

I then take the derivative and set equal to 0:

$[(-n/t^{n+1})e^{-\sum_i x_i/t}+(1/t^n)d/dt(-\sum_i x_i/t)e^{-\sum_i x_i/t}]\prod_i P(s_i=1)=0$

$[(-n/t^{n+1})e^{-\sum_i x_i/t}+(1/t^n)(\sum_i x_i/t^2)e^{-\sum_i x_i/t}]\prod_i P(s_i=1)=0$

$[-nt+\sum_i x_i][e^{-\sum_i x_i/t}/t^{n+2}]\prod_i P(s_i=1)=0$

$-nt+\sum_i x_i=0$

$t=\sum_i x_i/n$

But it looks like the official answer is $t=\sum_i P(s_i=1)x_i/\sum_i P(s_i=1)$

What went wrong?

Thanks

J.D.
  • 49
  • 8
  • @Xi'an I thought the goal of the Maximization part of the EM algorithm was to find the values of the distribution parameters that maximized ____ I'm not entirely clear what is being maximized, which is part of why I'm asking this question. I assumed it was the probability of the observation sequence given the probabilities that each term of the sequence is in each of the states. But I call them re-estimation formulas because we already have estimates of the parameters in the Expectation step, and we're updating them in the Maximization step. Isn't that right? – J.D. Apr 19 '20 at 15:25
  • I also thought the re-estimation formulas for the parameters of each state depended only on that state, not the other states, which is why I didn't specify the other state distributions. But if that is not so, let's take the simplest situation I can think of: There is only one other state (state 2), with distribution given by 1/u e^{(x-1)/u}, where u>0 is a parameter. Also assume 0<=x<=1. – J.D. Apr 19 '20 at 15:33
  • Ok, let's try this instead: We still have $0\leq x – J.D. Apr 19 '20 at 15:50

1 Answers1

2

In this model, the joint distribution of $(X_i,S_i)$ is, assuming the $S_i$'s are iid, which is not a major loss of generality, $$\mathbb{P}(S_i=s_i) f(x_i|s_i,t)=\{p\, e^{-x_i/t}/t\}^{\mathbb I_{s_i=1}}\{(1-p) \sqrt{{2}/{\pi}}\,e^{-x_i^2/2}\}^{\mathbb I_{s_i=2}}$$ and the associated completed likelihood is thus $$L^c(t|D,S) = \prod_{i=1}^n \{p\, e^{-x_i/t}/t\}^{\mathbb I_{s_i=1}}\{(1-p) \sqrt{{2}/{\pi}}\,e^{-x_i^2/2}\}^{\mathbb I_{s_i=2}}$$ (where $D$ stands for data, i.e., the observed sample $(x_1,\ldots,x_n)$ and $S$ for states, i.e., the latent variables $(s_1,\ldots,s_n)$).

The E-step of the EM algorithm requires to compute the target function \begin{align} Q(t,t') &= \mathbb E_{t}[\log L^c(t'|D,S)|D] \\ &=\sum_{i=1}^n [\log(p) -x_i/t'-\log(t')]\mathbb P_t(S_i=1|X_i=x_i)+C\\ \end{align} where $C=\log(1-p)\mathbb P_t(S_i=2|X_i=x_i)+\cdots$ depends on the data but not on the parameter $t'$. This even simplifies further into $$Q(t,t')=\sum_{i=1}^n [-x_i/t'-\log(t')]\mathbb P_t(S_i=1|X_i=x_i)+C$$ with $$\mathbb P_t(S_i=1|X_i=x_i)=\dfrac{p\, e^{-x_i/t}/t}{p\, e^{-x_i/t}/t+(1-p)\sqrt{{2}/{\pi}}\,e^{-x_i^2/2}}$$ The M-step $$t^* = \arg\max_{t'} Q(t,t')$$ leads to the first-order derivative equation $$\sum_{i=1}^n [x_i/(t^*)^2-1/t^*]\mathbb P_t(S_i=1|X_i=x_i)=0$$ hence [multiplying both sides of the equation by $(t^*)^2$] to $$t^* = \sum_{i=1}^n x_i\mathbb P_t(S_i=1|X_i=x_i)\Big/\sum_{i=1}^n \mathbb P_t(S_i=1|X_i=x_i)$$

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • Hi, thanks for the reply. What is $p$? And what is $D$? – J.D. Apr 20 '20 at 01:51
  • Thanks for the clarification. But I'm still not sure what $p$ is. From context, it seems like it should maybe be $P(S_i=1)$? But it doesn't have the subscript $i$. And shouldn't that have already been accounted for by the $\mathbb{I}_{s_i=1}$? – J.D. Apr 20 '20 at 08:04
  • Ok, I see now. $p=P(S_i=1)$ and $1-p=P(S_i=2)$, assuming the $S_i$ are iid. As for the "This even simplifies further into" step though, why does the $\log(p)$ term simply disappear? (Granted it will disappear later when taking the derivative since it does not depend on t'; but why does it disappear at this step?) Thanks again – J.D. Apr 20 '20 at 13:28