I'm clustering genes on gene expression data. Here's a hierarchically clustered heatmap using ward linkage and Euclidean distance
It clearly shows there are 5 or 6 clusters. Now when I evaluate their silhouette score on labels calculated from f_cluster, scipy. I get a decreasing curve like this
And increasing DB scores, although there is a slight dip at 4 to 5, 7 to 8 and 9 to 10
My question is : Should I take this curve as a "proof" that 5 or 8 clusters are better, even though the plot shows they are only relatively better than their neighbors? Or should I conclude that 2 clusters are best, even though heatmap shows otherwise? Why doesn't the heatmap translate to good scores on both the indices?