When would I use EM instead of k-means?

Question

When would I want to assign cluster probabilities to patterns instead of hard assignments to clusters? Can someone elaborate?

score 5 · Accepted Answer · edited Apr 13 '17 at 12:44

The prototypical cases would be situations in which there is good reason to believe that there are clusters, but there isn't any clear separation between them. In cases like that, the reality of the situation is that there will be uncertainty about your cluster assignments so it is ideal to use an approach that reflects that. Using a finite Gaussian mixture model (note that the EM algorithm is just the way you estimate the GMM, it isn't the clustering model itself) is one way to respect that fact about your situation. (For what it's worth, there are others, such as fuzzy k-means.)

For some concrete examples of situations like this, it may help to read some of my answers that have discussed / demonstrated this:

I asked a similar question [here] (https://stats.stackexchange.com/questions/372477/comparing-k-means-and-expectation-maximization-on-the-dataset-generated-does-k) but is linked to cluster quality. Could you help me with this? — Suhail Gupta, Oct 18 '18 at 05:43

When would I use EM instead of k-means?

1 Answers1