3

Say clustering was performed. My concern is to understand what characterizes certain cluster. Which variables differ most for a certain cluster? Are there any methods for performing such kind of analysis?

For example, say I have 50 variables and 10 clusters. The desired result would sound like: for a cluster 1 variables 5, 16, 23, 42 and 49 contribute to the biggest difference from other clusters.

mpiktas
  • 33,140
  • 5
  • 82
  • 138
danas.zuokas
  • 1,044
  • 9
  • 15

2 Answers2

4

You can always assume that the clusters are classes and train a classifier on that.

A decision tree classifier comes to my mind, as it will often produce a "human readable" classification tree.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
0

An interesting topic to look into is Fuzzy clustering.

The idea behind fuzzy clustering is that a cluster can have elements that are more "typical" than others and therefore have a higher membership within that cluster than other members which are less "typical".

For example, a Robin would have a high value (say 0.9) for its membership in the Birds cluster, while a Penguin would have a low value (say 0.4) for its membership in the Birds cluster.

With 50 data items (and 10 clusters!) to cluster a good inference might be difficult to obtain.

A good book to read on Fuzzy Clustering and Fuzzy Logic is the following Fuzzy Sets and Fuzzy Logic: Theory and Applications by Bo and Klir, although it is a bit dated.

Andrew
  • 1,090
  • 10
  • 26