I am trying to understand the EM algorithm, and I see that in the expectation step,we need to compute $E_{Z|X} {\log p(X,Z)} = \sum_{Z} {p(Z|X) \log p(X,Z)}$ where $Z$ is the hidden variable and $X$ is the observed variable. I am understanding that we need to compute the expected value of $\log p(X,Z)$ since we cannot compute the exact value of $\log p(X,Z)$ due to the variable $Z$ whose value we are not aware of. But I have 2 separate confusions here:
Why don't we do, say, a block coordinate descent instead? That is, why don't we compute $Z$ and the parameters of the model (say $\theta$) iteratively such that we compute $Z$ in one step using the current value of $\theta$ and $\theta$ in the other step using the current value of $Z$?
Let's say I understood that we cannot do block coordinate descent but we need to compute that expectation of $\log p(X,Z)$. Then why don't we compute $E_Z {\log p(X,Z)} = \sum_{Z} {p(Z) \log p(X,Z)}$ but we compute $E_{Z|X} {\log p(X,Z)}$? What does it mean (intuitively, or in words) to compute an expected value with respect to a conditional probability?