Cluster analysis- Caring only about the number of clusters in the data not what they are?

Question

I know some analysis exists for calculating k for kmeans or kmediods but they dont seem to be rigrous enough if i only care so much about k not what are in the clusters. Is there a rigorous process/algorithm to estimate number of clusters in my data ?

Check this intruduction, too https://stats.stackexchange.com/a/358937/3277. — ttnphns, Feb 29 '20 at 08:32

score 2 · Accepted Answer · answered Feb 29 '20 at 01:35

Yes, and it's a very well-developed field. The approach for estimating the optimal number of clusters in a data set is called "cluster validity."

See:

N. Speer, C. Spieth, and A. Zell. Biological cluster validity indices based on the gene ontology. Lecture Notes in Computer Science, 3646:429--439, 2005.

D. Davies and D. Bouldin. A cluster separation measure. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1(2):224--227, 1979.

J. Dunn. Well separated clusters and optimal fuzzy partitions. J. Cybernetics, 4:95--104, 1974.

P. Rousseuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Computational and Applied Mathematics, 20:53--65, 1987.

M. Gonzalez~Toledo. A comparison in cluster validation techniques. Master's thesis, University of Puerto Rico - Mayaguez Campus, 2005.

N. Bolshakova and F. Azuaje. Cluster validation techniques for genome expression data. Signal Process., 83(4):825--833, 2003.

Thank you so much. This will help me get started. – Sherif Negm Feb 29 '20 at 02:24 — Sherif Negm, Feb 29 '20 at 02:24

Cluster analysis- Caring only about the number of clusters in the data not what they are?

1 Answers1