3

I am looking at a zooplankton community assemblages using hierarchical cluster analysis, indicator species analysis, and non-metric multidimensional scaling based on Bray-Curtis dissimilarities. My input is a species abundance (log10(n+0.01)+1 transformed) by sample matrix.

My question relates to the input for the hierarchical cluster analysis. Can I use the Bray-Curtis dissimilarity output for my hierarchical cluster analysis with Ward’s (ward.D2) as the method or does the fact that Bray-Curtis is non Euclidean violate a list of assumptions I haven’t gotten my head wrapped around yet?

Using Bray-Curtis with the HCA and ISV produces results that make quite a bit of ecological sense, but I want to make sure I haven't tricked myself into thinking I know something I don't actually know (a la Pirsig).

Here is my R code.

#  Get dissimilarity matrix from vegan's Bray-Curtis
d <- vegdist(pwslog)
# Vegan cluster analysis
hpws <- hclust(d, method = "ward.D2")
# Define 6 groups from the HCA
groups.6 <- cutree(hpws, 6)
# Run indicspecies' Indicator Species Analysis and look at results
indval = multipatt(pwslog, (cutree(hpws, 6)),
               duleg = TRUE, control = how(nperm = 999))
summary(indval)

Thank you for your time and suggestions.

mdewey
  • 16,541
  • 22
  • 30
  • 57
Metridia
  • 31
  • 1
  • 4
  • 1
    True that Ward's [linkage method](http://stats.stackexchange.com/a/217742/3277) (and some other, such as centroid etc) is _theoretically_ only for euclidean distance. It computes centroids assuming the distances are euclidean. Bray-Curtis isn't such distance. Well, if your specific dataset distance matrix can reasonable well [converge in euclidean space](http://stats.stackexchange.com/a/69206/3277) (it's [double centred](http://stats.stackexchange.com/a/12882/3277) matrix has only very small negative eigenvalues) then your observed distances (to cont.) – ttnphns Sep 03 '16 at 11:46
  • 1
    (cont.) are de facto close to be euclidean and you might use Ward. But that will confine to your given distance matrix and not to Bray-Curtis distance in general, and so you can't extrapolate your clustering results onto a population. – ttnphns Sep 03 '16 at 11:46
  • 1
    See [these points](http://stats.stackexchange.com/a/195481/3277) which might guide to select a clustering method. If you want a clustering method which is based on the notion of "central point" or type but your distances aren't euclidean and hence you are wary to use "euclidean" methods such as Ward, you might consider medoid clustering (PAM, it isn't hierarchical). – ttnphns Sep 03 '16 at 11:55
  • 1
    Btw, you are not saying if your initial data (from which Bray-Curtis matrix is computed) is quantitative or binary. – ttnphns Sep 03 '16 at 11:57
  • 1
    What are the results of other clustering methods? Do they give similar results? – Archie Dec 13 '16 at 14:24
  • 1
    @ttnphns, why not develop your comments into an official answer? (I find the point that this might be able to be done, but you wouldn't be able to generalize it, to be especially insightful.) Otherwise, this thread is officially *unanswered*. – gung - Reinstate Monica Jul 11 '20 at 13:47
  • 1
    UPGMA analysis or single linkage – Dr.Alaa Mahgoub Jul 11 '20 at 11:46

0 Answers0