Dirichlet process mixture models is a non-parametric
model, the name is a bit misleading - you need to choose the base measure and the concentration parameters, which are parameters.
However, when considering BIC the difference between the nonparametric and parametric models is more obvious.
Consider the BIC
equation:
$\mathrm {BIC} =k\ln(n)-2\ln({\widehat {L}})$
Where k
is the number of parameters evaluated, and $\widehat {L}$ is the model likelihood.
When working on the Bayesian formulation of GMM, you mostly use the BIC criterion to find the optimal numbers of clusters, theres a tradeoff- If you increase k
the first term will raise, while the second drops, as the likelihood of the model increases, this is what guarantees that a model where K=N
(e.g. a cluster for each point, maximum likelihood) will not necessarily be better than other models with fewer clusters and lesser likelihood.
However, you do not change both the prior and the k, as for each prior that will be an optimal k, thus changing both at the same time is not really meaningfull.
In DPMM, the first term is meaningless, the number of parameters stay constant, the k
is misleading - it is the number of parameters in the model, not the number of clusters. for gmm it can be the mean and covriance of each cluster, thus the k
in the BIC will actually be 2x the number of clusters in the model.
In contrast, in DPGMM, the number of parameters in the model is not attached to the number of clusters, it is constant.
Thus only the likelihood remains, making the BIC irrelevant - DPGMM ideally finds the optimal number of clusters on his own.