3

Homogeneity of clusters can easily measure by calculating the sum of squared error (SEE): $$SSE = \sum_k \sum_{i \in c_k} \| x_i - \overline{c_k} \|^2$$ where $\overline{c_k}$ is the mean vector of cluster $k$. A disadvantage of this measure might be that it favors compact clusters. Another idea to emasure cluster homogeneity is the following: $$H = \sum_k \sum_{i \in c_k} \sum_{j \in c_k, j \neq i} \| x_i - x_j \|^2$$ The measure $H$ reflects the pairwise similarity of the cluster elements. It is not based on the cluster mean, and therefore does not favor compact clusters.

Does such an measure already exist in literature? Why is this measure not frequently used?

Funkwecker
  • 2,432
  • 5
  • 24
  • 43
  • 2
    The within-cluster SSE is definitely related to the within-cluster sum of of pairwise d^2, when we speak of euclidean distances. So, your two formulas are informationally equivalent - if you take the number of points in a cluster into account. See my answer http://stats.stackexchange.com/a/81494/3277. – ttnphns Jan 15 '14 at 08:16
  • Thank you very much for your comment. That actually answered my question. I voted for your answer in the mentioned thread. – Funkwecker Jan 15 '14 at 08:24
  • My main concern with SSE is that is assumes that **all attributes have the same meaning** and scale. Works well on toy data sets, often fails badly on real data. – Has QUIT--Anony-Mousse Jan 17 '14 at 09:01
  • 2
    Possible duplicate of [Why does k-means clustering algorithm use only Euclidean distance metric?](http://stats.stackexchange.com/questions/81481/why-does-k-means-clustering-algorithm-use-only-euclidean-distance-metric) – justhalf Oct 03 '16 at 02:53

0 Answers0