How to determine variables that make certain cluster different from others?

Question

Say clustering was performed. My concern is to understand what characterizes certain cluster. Which variables differ most for a certain cluster? Are there any methods for performing such kind of analysis?

For example, say I have 50 variables and 10 clusters. The desired result would sound like: for a cluster 1 variables 5, 16, 23, 42 and 49 contribute to the biggest difference from other clusters.

score 4 · Accepted Answer · answered May 10 '12 at 08:30

4

You can always assume that the clusters are classes and train a classifier on that.

A decision tree classifier comes to my mind, as it will often produce a "human readable" classification tree.

answered May 10 '12 at 08:30

Has QUIT--Anony-Mousse

39,639
7
61
96

I like this approach! Is there any literature on it I can cite? – Abraham D Flaxman Mar 09 '15 at 20:25

score 0 · Answer 2 · answered May 10 '12 at 12:22

An interesting topic to look into is Fuzzy clustering.

The idea behind fuzzy clustering is that a cluster can have elements that are more "typical" than others and therefore have a higher membership within that cluster than other members which are less "typical".

For example, a Robin would have a high value (say 0.9) for its membership in the Birds cluster, while a Penguin would have a low value (say 0.4) for its membership in the Birds cluster.

With 50 data items (and 10 clusters!) to cluster a good inference might be difficult to obtain.

A good book to read on Fuzzy Clustering and Fuzzy Logic is the following Fuzzy Sets and Fuzzy Logic: Theory and Applications by Bo and Klir, although it is a bit dated.

How to determine variables that make certain cluster different from others?

2 Answers2

Linked

Related