2

In the E-step of the EM algorithm we maximize $$\max_\theta \sum_Z p(Z\mid X,\theta_\text{old})\log p(X,Z\mid\theta).$$ This expression is called the expectation of the complete data log-likelihood $\log p(X,Z\mid\theta)$. I do not see any expectation, which is defined as $E(Y)=\sum_YYp(Y)$. Why is it called this way? How can I see it is an expectation?

Michael Hardy
  • 7,094
  • 1
  • 20
  • 38
tomka
  • 5,874
  • 3
  • 30
  • 71
  • 1
    I'd write $\displaystyle \operatorname{E}(Y) = \sum_y y p(y),$ being careful about which $Y\text{s}$ are capital and which $y\text{s}$ are lower-case. – Michael Hardy Apr 28 '17 at 22:39

1 Answers1

7

You are combining both steps. Breaking them out (e.g. see here), you have

E step

$Q(\theta\mid\theta_\text{old})=\sum_Z p(Z\mid X,\theta_\text{old})\log p(X,Z|\theta)$

M step

$\theta_\text{new}=\max_\theta Q(\theta\mid\theta_\text{old})$

For the "E step", you are computing the average $\mathbb{E}\big[\log p(X,Z\mid\theta)\big]$, taking $Z\sim p(Z\mid X,\theta_\text{old})$.

Michael Hardy
  • 7,094
  • 1
  • 20
  • 38
GeoMatt22
  • 11,997
  • 2
  • 34
  • 64
  • So this is the expectation $\mathbb{E}_Z$, i.e. with respect to $Z$ only? I appears that otherwise we would need to weight by $p(Z,X|\theta_{old})$. – tomka Apr 28 '17 at 19:20
  • Yes, $X$ is the observed data which does not change. The average is over possible values of the hidden data $Z$. – GeoMatt22 Apr 28 '17 at 19:28
  • But that means that the "E-step" is not a step at all, in the usual meaning of this word, doesn't it? – Elmar Zander Dec 14 '18 at 14:10