Combining AIC and BIC

Question

For my dataset of ~19K data points to cluster, I want to use a criterion to choose the number of clusters. BIC (Bayesian Information Criterion) gives too few clusters (~180) while AIC (Akaike Information Criterion) gives too many (~1400). Intuitively, I feel that ~500 clusters would be optimal putting ~40 data points in each cluster on average. But apparently, I need to have a statistical explanation for choosing ~500. Is there a way to combine AIC and BIC such that we have neither too few nor too many clusters?

I am not asking about when choosing one of AIC or BIC over the other. I already know that BIC penalizes the number of free parameters much more than AIC, but based on prior information about the data I have, I want to have a penalty which is not as high as BIC's and not as low as AIC's.

I can just select 500 clusters and go ahead, but the reviewers of the submitted papers always need some statistical reason for choosing cluster count, that's actually why I need that.

Here are the formulas that I use for BIC and AIC:

BIC: $-2 \times ln(L) + ln(p) \times k\times n $

AIC: $-2 \times ln(L) + 2\times k\times n$

where

p = the number of data points to cluster
k = the number of clusters
n = the number of dimensions of each data point
L = the likelihood.

For something canned, you might want to consider using ICL (Integrated Completed Likelihood - a classification-like version of BIC) or NEC (Normalised Entropy Criterion). — usεr11852, Jun 30 '16 at 01:23
Possible duplicate of [Is there any reason to prefer the AIC or BIC over the other?](http://stats.stackexchange.com/questions/577/is-there-any-reason-to-prefer-the-aic-or-bic-over-the-other) — Xi'an, Jun 30 '16 at 06:04
You are presumably speaking of AIC/BIC _clustering criterions_? Please give their formulas or direct link to them and how they are used! So far it is unclear what you were doing. — ttnphns, Jun 30 '16 at 06:43
@General Abrial, Xi'an, ttnphns, thanks for the comments. I updated the question accordingly. — user5054, Jun 30 '16 at 06:58
But AIC and BIC (original) themselves are not clustering criterions, they can't help choosing the number of clusters. There exist clustering criterions based on AIC or BIC. And I'm asking: bring in their formulas. Show what you are using, please! Display how you compute the number of clusters. — ttnphns, Jun 30 '16 at 07:04
Have you tried AICc? This is AIC with an extra term to penalise overfitting. https://en.wikipedia.org/wiki/Akaike_information_criterion#AICc — arboviral, Jun 30 '16 at 08:26
A bit strange formulas. Where did you get them from? `log-likelihood which is the negative of the total intra-cluster sum of squares` Log-likelihood should itself imply a logarithm inside; but you then take logarithm one more time of it. [Here](http://stats.stackexchange.com/q/55147/3277) I gave computation of AIC and BIC clustering criterions as they are computed in TwoStep cluster analysis of SPSS. — ttnphns, Jun 30 '16 at 08:35
@ttnphns That's right, the inside log is not needed. Corrected.. Thanks for the link! — user5054, Jun 30 '16 at 08:49
As explained on stackoverflow.com/questions/15839774/…, for k-means, $-ln(L)$ in BIC and AIC formulas are the k-means objective to minimize, which is the total intra-cluster sum-of-squares. I think this is coming from the Gaussian distribution formula. — user5054, Jun 30 '16 at 10:36

Combining AIC and BIC

0 Answers0