I am trying to do clustering on a distance matrix which contains numeric data. But I am not sure how to decide upon the number of clusters or value k for clara function in R. But after running it with some random number of clusters, I ran silhouette function on it and summary gives me like this:
Cluster sizes and average silhouette widths:
7 3 4 5 7 4
0.222273330 -0.001592881 0.117937463 0.121326365 0.137911639 0.161932689
Individual silhouette widths:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.10410 0.08961 0.12500 0.14140 0.19840 0.30580
This is the result for value of k=6. If I change it to say 5 or 4, I obtain silhouette for each cluster and also mean value. How do I decide upon the number of clusters? Do I need to plot like mean silhouette vs k? How do we do something like this in a large dataset with around million observations?