Questions tagged [expectation-maximization]

An optimization algorithm often used for maximum-likelihood estimation in the presence of missing data.

The expectation maximization (EM) algorithm is an optimization algorithm. It is an instance of a broader class of optimization algorithms known as Majorization-Minimization (MM). It is a cornerstone of statistics since it particularly suitable for "skipping" local maxima which often arise in likelihood maximization problems, especially in the presence of missing data.

More specifically, the EM algorithm is an iterative method for finding maximum likelihood estimates. The typical form of the EM algorithm works as follows:

  • Expectation Step: Compute the expected value of the log-likelihood function based on the current estimate.
  • Maximization Step: Update the estimate by maximizing the result of the expectation step.
560 questions
125
votes
9 answers

Numerical example to understand Expectation-Maximization

I am trying to get a good grasp on the EM algorithm, to be able to implement and use it. I spent a full day reading the theory and a paper where EM is used to track an aircraft using the position information coming from a radar. Honestly, I don't…
61
votes
3 answers

Clustering with K-Means and EM: how are they related?

I have studied algorithms for clustering data (unsupervised learning): EM, and k-means. I keep reading the following : k-means is a variant of EM, with the assumptions that clusters are spherical. Can somebody explain the above sentence? I do…
34
votes
1 answer

Relation between variational Bayes and EM

I read somewhere that Variational Bayes method is a generalization of the EM algorithm. Indeed, the iterative parts of the algorithms are very similar. In order to test whether the EM algorithm is a special version of the Variational Bayes, I tried…
Ufuk Can Bicici
  • 2,028
  • 1
  • 17
  • 26
32
votes
3 answers

What is the difference between EM and Gradient Ascent?

What is the difference between the algorithms EM (Expectation Maximization) and Gradient Ascent (or descent)? Is there any condition under which they are equivalent?
Aslan986
  • 728
  • 2
  • 7
  • 18
31
votes
2 answers

Why is the Expectation Maximization algorithm guaranteed to converge to a local optimum?

I have read a couple of explanations of EM algorithm (e.g. from Bishop's Pattern Recognition and Machine Learning and from Roger and Gerolami First Course on Machine Learning). The derivation of EM is ok, I understand it. I also understand why the…
michal
  • 1,138
  • 3
  • 11
  • 14
26
votes
3 answers

Why is the expectation maximization algorithm used?

From what little I know the EM algorithm can be used to find the maximum likelihood when setting to zero the partial derivatives with respect to the parameters of the likelihood gives a set of equations that cannot be solved analytically. But is the…
user782220
  • 952
  • 9
  • 15
25
votes
4 answers

EM maximum likelihood estimation for Weibull distribution

Note: I am posting a question from a former student of mine unable to post on his own for technical reasons. Given an iid sample $x_1,\ldots,x_n$ from a Weibull distribution with pdf $$ f_k(x) = k x^{k-1} e^{-x^k} \quad x>0 $$ is there a useful…
22
votes
2 answers

EM algorithm manually implemented

I want to implement the EM algorithm manually and then compare it to the results of the normalmixEM of mixtools package. Of course, I would be happy if they both lead to the same results. The main reference is Geoffrey McLachlan (2000), Finite…
21
votes
5 answers

Motivation of Expectation Maximization algorithm

In the EM algorithm approach we use Jensen's inequality to arrive at $$\log p(x|\theta) \geq \int \log p(z,x|\theta) p(z|x,\theta^{(k)}) dz - \int \log p(z|x,\theta) p(z|x,\theta^{(k)})dz$$ and define $\theta^{(k+1)}$ by $$\theta^{(k+1)}=\arg…
user782220
  • 952
  • 9
  • 15
18
votes
2 answers

Why is optimizing a mixture of Gaussian directly computationally hard?

Consider the log likelihood of a mixture of Gaussians: $$l(S_n; \theta) = \sum^n_{t=1}\log f(x^{(t)}|\theta) = \sum^n_{t=1}\log\left\{\sum^k_{i=1}p_i f(x^{(t)}|\mu^{(i)}, \sigma^2_i)\right\}$$ I was wondering why it was computationally hard to…
18
votes
1 answer

E-M, is there an intuitive explanation?

The E-M procedure appears, to the uninitiated, as more or less black magic. Estimate parameters of an HMM (for example) using supervised data. Then decode untagged data, using forward-backward to 'count' events as if the data were tagged, more or…
bmargulies
  • 506
  • 4
  • 9
17
votes
1 answer

Training a basic Markov Random Field for classifying pixels in an image

I am attempting to learn how to use Markov Random Fields to segment regions in an image. I do not understand some of the parameters in the MRF or why the expectation maximisation I perform fails to converge to a solution sometimes. Starting from…
16
votes
2 answers

Why Expectation Maximization is important for mixture models?

There are many literature emphasize Expectation Maximization method on mixture models (Mixture of Gaussian, Hidden Markov Model, etc.). Why EM is important? EM is just a way to do optimization and is not widely used as gradient based method…
15
votes
2 answers

Finding category with maximum likelihood method

Let's say that we had an information for men and women heights. R code: set.seed(1) Women=rnorm(80, mean=168, sd=6) Men=rnorm(120, mean=182, sd=7) par(mfrow=c(2,1)) hist(Men, xlim=c(150, 210), col="skyblue") hist(Women, xlim=c(150, 210),…
15
votes
1 answer

Why should one use EM vs. say, Gradient Descent with MLE?

Mathematically, it's often seen that expressions and algorithms for Expectation Maximization (EM) are often simpler for mixed models, yet it seems that almost everything (if not everything) that can be solved with EM can also be solved with MLE (by,…
1
2 3
37 38