Why is the expectation step in the EM algorithm called this way?

Question

In the E-step of the EM algorithm we maximize $$\max_\theta \sum_Z p(Z\mid X,\theta_\text{old})\log p(X,Z\mid\theta).$$ This expression is called the expectation of the complete data log-likelihood $\log p(X,Z\mid\theta)$. I do not see any expectation, which is defined as $E(Y)=\sum_YYp(Y)$. Why is it called this way? How can I see it is an expectation?

I'd write $\displaystyle \operatorname{E}(Y) = \sum_y y p(y),$ being careful about which $Y\text{s}$ are capital and which $y\text{s}$ are lower-case. — Michael Hardy, Apr 28 '17 at 22:39

score 7 · Accepted Answer · edited Apr 28 '17 at 22:38

7

You are combining both steps. Breaking them out (e.g. see here), you have

E step

$Q(\theta\mid\theta_\text{old})=\sum_Z p(Z\mid X,\theta_\text{old})\log p(X,Z|\theta)$

M step

$\theta_\text{new}=\max_\theta Q(\theta\mid\theta_\text{old})$

For the "E step", you are computing the average $\mathbb{E}\big[\log p(X,Z\mid\theta)\big]$, taking $Z\sim p(Z\mid X,\theta_\text{old})$.

edited Apr 28 '17 at 22:38

Michael Hardy

7,094
1
20
38

answered Apr 28 '17 at 18:57

GeoMatt22

11,997
2
34
64

So this is the expectation $\mathbb{E}_Z$, i.e. with respect to $Z$ only? I appears that otherwise we would need to weight by $p(Z,X|\theta_{old})$. – tomka Apr 28 '17 at 19:20
Yes, $X$ is the observed data which does not change. The average is over possible values of the hidden data $Z$. – GeoMatt22 Apr 28 '17 at 19:28
But that means that the "E-step" is not a step at all, in the usual meaning of this word, doesn't it? – Elmar Zander Dec 14 '18 at 14:10

Why is the expectation step in the EM algorithm called this way?

1 Answers1

Linked