Practical considerations for EM clustering

Question

EM algorithm guarantees finding a local rather then global minimum of the likelihood. As a consequence, the results are dependent on the initial conditions (e.g., if randomly choosing the initial cluster centers.) Some tutorials that I have seen mention running the EM algorithms several times in such cases. This however entails several practical considerations:

How many times to run the algorithm?
How to choose more fiable solutions?
How to represent the results, if multiple solutions could be reasonable?
How to estimate the optimal number of clusters, privided multiple solutions?
When to give up altogether on clustering / mixture-modeling for the given dataset?

Background: I am clustering data in the context of enterotyping, i.e., assigning microbial compositions to several typical compositional types.

Update: Gelman-Rubin criterion
I wonder whether there exist anything in the spirit of Gelman-Rubin criterion used in a similar setting: when a mixture model is analyzed with Markov chain Monte Carlo technique. The criterion compares the variance among the MC threads launched with different initial conditions, and the variance within thread - the ratio that should converge to 1, if the number of the MC steps is sufficient.

Update 2: MC-EM and SAEM
Another approach, that partially an alternative to multiple re-runs of the EM, is Monte Carlo Expectation Maximization. See Rationale for MCEM and Avoiding burn-ins in MC-EM for more discussion, as well as A New Class of Stochastic EM Algorithms. Escaping Local Maxima and Handling Intractable Sampling

score 1 · Answer 1 · answered Dec 23 '21 at 10:57

How many times to run the algorithm?

There is no hard rule for the number of repetitions. The number should increase with the dimension of the parameter but the spread of the starting values matters as well.

How to choose more fiable solutions?

The absolute rule is to take the solution with the highest observed likelihood. If the likelihood cannot be computed, a Monte Carlo approximation could be used.

How to represent the results, if multiple solutions could be reasonable?

If the likelihood values cannot be computed, there is no way to pick one.

How to estimate the optimal number of clusters, provided multiple solutions?

Express the number of clusters in terms of the latent variables and compute its expectation given the EM estimate of the parameters.

When to give up altogether on clustering / mixture-modeling for the given dataset

There is no answer without further details.

Thanks, what you say makes a lot of sense, and some of these judgements are certainly subjective. I however hoped for an estimate for a number of reruns, to be statistically reliable. Also, increasing the number of clusters increases the number of parameters - is it appropriate to use AIC or BIC here? — Roger Vadim, Dec 23 '21 at 11:38
It is correct that you cannot estimate the number $k$ of components by sheer MLE. AIC, BIC, or WAIC are penalisations that can be used for this purpose. Here is a [reference that tries to address this problem](https://www.researchgate.net/publication/229009009_Detecting_the_Number_of_Clusters_during_Expectation-Maximization_Clustering_Using_Information_Criterion). — Xi'an, Dec 23 '21 at 16:09

score 0 · Answer 2 · answered Jan 06 '22 at 16:26

This does not fully answers the question as posed, but came up as a practical approach to K-means clustering: one runs the algorithm multiple times, choosing the realisation that minimizes the desired loss function, as the one representing global minimum.

This is, e.g., the approach adopted by the scikit-learn realization of K-means:

n_init int, default=10 Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

inertia_ float Sum of squared distances of samples to their closest cluster center, weighted by the sample weights if provided.

This strategy is also discussed in thread Avoiding local minima when using Kmeans

Practical considerations for EM clustering

2 Answers2

Linked