In the E-step of the EM algorithm we maximize $$\max_\theta \sum_Z p(Z\mid X,\theta_\text{old})\log p(X,Z\mid\theta).$$ This expression is called the expectation of the complete data log-likelihood $\log p(X,Z\mid\theta)$. I do not see any expectation, which is defined as $E(Y)=\sum_YYp(Y)$. Why is it called this way? How can I see it is an expectation?
Asked
Active
Viewed 641 times
2

Michael Hardy
- 7,094
- 1
- 20
- 38

tomka
- 5,874
- 3
- 30
- 71
-
1I'd write $\displaystyle \operatorname{E}(Y) = \sum_y y p(y),$ being careful about which $Y\text{s}$ are capital and which $y\text{s}$ are lower-case. – Michael Hardy Apr 28 '17 at 22:39
1 Answers
7
You are combining both steps. Breaking them out (e.g. see here), you have
E step
$Q(\theta\mid\theta_\text{old})=\sum_Z p(Z\mid X,\theta_\text{old})\log p(X,Z|\theta)$
M step
$\theta_\text{new}=\max_\theta Q(\theta\mid\theta_\text{old})$
For the "E step", you are computing the average $\mathbb{E}\big[\log p(X,Z\mid\theta)\big]$, taking $Z\sim p(Z\mid X,\theta_\text{old})$.

Michael Hardy
- 7,094
- 1
- 20
- 38

GeoMatt22
- 11,997
- 2
- 34
- 64
-
So this is the expectation $\mathbb{E}_Z$, i.e. with respect to $Z$ only? I appears that otherwise we would need to weight by $p(Z,X|\theta_{old})$. – tomka Apr 28 '17 at 19:20
-
Yes, $X$ is the observed data which does not change. The average is over possible values of the hidden data $Z$. – GeoMatt22 Apr 28 '17 at 19:28
-
But that means that the "E-step" is not a step at all, in the usual meaning of this word, doesn't it? – Elmar Zander Dec 14 '18 at 14:10