I have a dataset with only ordinal binary variables (asymmetric categories: present vs absent). I have read about the (dis)simmilarity measures for binary data, but not about the clustering method (e.g. Ward). In SPSS, there is a warning about using another (dis)simmilarity measure than Sqaured Eucledian Distance. But I have also read that Ward should be the best method. So which method and measure should I use?
Asked
Active
Viewed 3,351 times
1
-
It is generally considered incorrect to use, with binary data, methods which compute centroids of clusters (Ward, centroid, median and some other) - read this [reminder list](http://stats.stackexchange.com/a/63549/3277), please. (Though opposite opinion isn't necessary a heresy: the question is discussable.) Use single, complete, average methods. (Between-group) average linkage (UPGMA) is the most "universal" clustering method. Where did you read that Ward is "the best"? - best for what and when? – ttnphns May 18 '15 at 20:17
-
1Use one of many (dis)similarity measures for binary data you've read about. Jaccard is the most popular, perhaps, but there are many other. You should not use Ward/centroid/median methods with these measures. – ttnphns May 18 '15 at 20:23
-
Ok so I could, for instance, use a single method with the jaccard measure? And do you have a reference of this information? So I could refer to that article? – Inge May 19 '15 at 19:49