One generally consider that a "good partitioning" must satisfy one or more of the following criteria: (a) compactness (small within-cluster variation), connectedness (neighbouring data belong to the same cluster), and spatial separation (must be combined with other criteria like compactness or balance of cluster sizes). As part of a large battery of internal measures of cluster validity (where we do not use additional knowledge about the data, like some a priori on class labeling), they can be complemented with so-called combination measures (for example, assessing intra-cluster homogeneity and inter-cluster separation), like Dunn or Davies–Bouldin index, silhouette width, SD-validity index, etc., but also estimates of predictive power (self-consistency and stability of a partitioning), how well distance information are reproduced in the resulting partitions (e.g., cophenetic correlation and Hubert's Gamma statistic).
A more complete review, and simulation results, are available in
Handl, J., Knowles, J., and Kell, D.B.
(2005). Computational cluster
validation in post-genomic data
analysis. Bioinformatics,
21(15): 3201-3212.
I guess you could rely on some of them for comparing your different cluster solutions and choose the features set that yields the better indices. You can even use bootstrap to get an estimate of the variability of those indices (e.g., cophenetic correlation, Dunn's index, silhouette width), as was done by Tom Nichols and coll. in a neuroimaging study, Finding Distinct Genetic Factors that Influence Cortical Thickness.
If you are using R, I warmly recommend taking a look at the fpc package, by Christian Hennig, which provides almost all statistical indices described above (cluster.stats()
) as well as a bootstrap procedure (clusterboot()
).
About the use of mutual information in clustering, I have no experience with it but here is a paper that discusses its use in a genomic context (with comparison to k-means):
Priness, I., Maimon, O., and Ben-Gal,
I. (2007). Evaluation of
gene-expression clustering via mutual
information distance measure. BMC
Bioinformatics, 8: 111.