0

so I was looking back at this tutorial (https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html) and it struck me that the example with clusters=4, the one I would have chosen "visually" was not the one with the highest score. Instead n_clusters=2 was chosen, something I would not have chosen

enter image description here enter image description here below the scores (taken verbatim from the tutorial)

For n_clusters = 2 The average silhouette_score is : 0.7049787496083262
For n_clusters = 3 The average silhouette_score is : 0.5882004012129721
For n_clusters = 4 The average silhouette_score is : 0.6505186632729437
For n_clusters = 5 The average silhouette_score is : 0.56376469026194
For n_clusters = 6 The average silhouette_score is : 0.4504666294372765

what am I missing? what would I need to change to have the option with clusters=4 be the winning one?

Asher11
  • 189
  • 1
  • 7
  • 1) [Criterion vs eye. If data are interval, clusters are not infrequently discernible visually...](https://stats.stackexchange.com/a/358937/3277). 2) There exist 100+ different internal clustering criteria. Silhouette is only one of them. Why not try other? Why not try, say, another _version_ of Silhouette criterion, called "Deviation Silhouette aka Simplified Silhouette"? – ttnphns Oct 16 '21 at 10:20
  • Yes at the end I have been trying a combination of different scores and it's working much better! – Asher11 Oct 18 '21 at 11:43

0 Answers0