One thing I cannot understand for the EM (Expectation maximization) algorithm, for the observable variable $Y$ and latent variable $Z,$ why don't we directly take MLE (Maximal likelihood estimation) of $Y?$ Since $Y$'s marginal distribution is independent on $Z,$ then we are no need to know the observation of $Z.$
Is the marginal distribution of $Y:f(y|\Theta)$ not easy to calculated? At least for the given examples in the book, we know their marginal distributions.
for example:
latent variable: $Z\sim Bernoulli(\pi),$
observably variable: $(Y|Z=1)\sim Bernoulli(p),\ (Y|Z=0)\sim Bernoulli(q),$
then we know $f(y|\pi,p,q) = \pi p^y(1-p)^{1-y} + (1-\pi)q^y(1-q)^{1-y}.$
latent variable: $P(Z=i) = p_i,$
observably variable: $(Y|Z=i)\sim n(\mu_i,\sigma^2_i),$ PDF noted as $n(y|\mu_i,\sigma^2_i).$
then we know $f(y|p_1\cdots p_n,\mu_1,\cdots\mu_n,\sigma^2_1\cdots,\sigma^2_n) = \sum\limits_ip_i\cdot n(y|\mu_i,\sigma^2_i).$
the marginal distributions of $Y$ are all known.