Validity of BIC for Dirichlet process mixture models

Question

I am implementing clustering using Dirichlet process mixture models via scikit learn's Variational Bayesian Gaussian Mixture model. I arrived at the appropriate priors iteratively, and I am able to plot the graph of inferred cluster count to prior belief in number of clusters.

However, I am unable to understand if and why the BIC technique is not applicable for these types of models. The newer version of the scikit API removed the support for BIC for this class of models. Any pointers are much appreciated.

Dinari · Answer 1 · 2021-02-21T10:12:01.940

Dirichlet process mixture models is a non-parametric model, the name is a bit misleading - you need to choose the base measure and the concentration parameters, which are parameters.

However, when considering BIC the difference between the nonparametric and parametric models is more obvious.

Consider the BIC equation:

$\mathrm {BIC} =k\ln(n)-2\ln({\widehat {L}})$

Where k is the number of parameters evaluated, and $\widehat {L}$ is the model likelihood. When working on the Bayesian formulation of GMM, you mostly use the BIC criterion to find the optimal numbers of clusters, theres a tradeoff- If you increase k the first term will raise, while the second drops, as the likelihood of the model increases, this is what guarantees that a model where K=N (e.g. a cluster for each point, maximum likelihood) will not necessarily be better than other models with fewer clusters and lesser likelihood. However, you do not change both the prior and the k, as for each prior that will be an optimal k, thus changing both at the same time is not really meaningfull.

In DPMM, the first term is meaningless, the number of parameters stay constant, the k is misleading - it is the number of parameters in the model, not the number of clusters. for gmm it can be the mean and covriance of each cluster, thus the k in the BIC will actually be 2x the number of clusters in the model. In contrast, in DPGMM, the number of parameters in the model is not attached to the number of clusters, it is constant.

Thus only the likelihood remains, making the BIC irrelevant - DPGMM ideally finds the optimal number of clusters on his own.

Validity of BIC for Dirichlet process mixture models

1 Answers1