2

I am researching cluster analysis, and I am interested in variables that are both categorical and continuous, for which I have read that a Gower's similarity coefficient is a good proximity measure. I am interested in first using an average linkage algorithm, and have found that some have recommended looking for the 'elbow' in the sum of squared error (SSE) scree plot as a guideline for deciding how many clusters to retain. I was wondering if the Gower's similarity coefficient (being non-metric and non-Euclidean) would allow me to create an SSE scree plot, or if that didn't make sense statistically.

ttnphns
  • 51,648
  • 40
  • 253
  • 462
Laura
  • 61
  • 2
  • 3

1 Answers1

3

SSE is the measure optimized by k-means.

It doesn't make much sense for any other algorithm than k-means. And even there it suffers from the fact that increasing k will decrease SSE, so you can mostly look at which point further increasing k stops yielding a substantial increase in SSE - that is essentially the vague "elbow method".

There exist other criteria such as Silhouette, Davies-Bouldin index, BIC, AIC that can be used to get an "alternative view" of what is actually optimal.

But in the end, that is just a mathematical heuristic. It may not work for real data.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96