Choice of clustering method with frequency data

Asked Sep 20 '17 at 21:53

Active Sep 21 '17 at 02:20

Viewed 748 times

We have a dataset of subgroups of the bacteria E. coli. We have frequency data for four subgroups in 38 locations and we want to cluster these locations by using the frequencies of subgroups which occur in each.

Initially we used two clustering methods:

Euclidian distance using the dist{stats} function in R; and
clustering after FactoMineR, using metric="euclidean", and method="ward".

Following this we received feedback that "median clustering would be better due to non-parametric aspects of data". Does this make sense?

edited Sep 21 '17 at 02:20

ttnphns

51,648
40
253
462

asked Sep 20 '17 at 21:53

LostBiologist

No. [Chi-sq or phi distance](https://stats.stackexchange.com/a/173669/3277) would be preferrable fot count data, and a non-geometric linkage in clustering, such as average or complete methods. To note also, "median" method actually is not about median in cluster, it is about how the centroid is defined, it has nothing to do with "nonparametrical analysis". See https://stats.stackexchange.com/a/217742/3277. – ttnphns Sep 21 '17 at 02:17

Choice of clustering method with frequency data

0 Answers0