Gradient Descent vs EM

Question

The EM algorithm is usually motivated because maximizing the log likelihood is described as being "complicated" or "difficult" due to having to take the log of the weighted sum of the likelihood; ie

$$ \underset{\theta, \omega}{max} \ \ \ \sum_i log(\sum_j \omega_j \cdot Pr(X_i|\theta_j)) $$

where there are $j=1\ldots J$ mixture classes, $\omega_j$ is the probability that of belonging to class $j$ and $\sum_j \omega_j = 1$. However, there are now many modern scientific computing resources with built automatic differentiation capabilities. Further, stochastic gradient descent algorithms are capable of minimizing much more complicated loss functions than a standard mixture model (although there is no guarantee of finding a global minimum). Given the ease of using these gradient descent algorithms, through for example Tensorflow or PyTorch, is there any compelling reason to still use EM? Especially if instead of minimizing the above I want to model the class weights also using a parametric model:

$$ \underset{\theta, \omega}{max} \ \ \ \sum_i log(\sum_j g(\omega_j) \cdot Pr(X_i|\theta_j)) $$

I would be grateful if any could point me to the relevant literature. Thanks!

They are both gradient descent, but they operate in different domains. — EngrStudent, Apr 03 '18 at 02:03
maybe take a look at this question https://stats.stackexchange.com/questions/262538/why-expectation-maximization-is-important-for-mixture-models — dontloo, Apr 03 '18 at 02:23

Gradient Descent vs EM

0 Answers0