1

I have been trying to compute Jaccard similarity index for all possible duo combinations for 7 communities and to create a matrix, or preferably Cluster plotting with the similarity index.

There are 21 combinations like Community1 vs Community2, Community1 vs Control and Control vs Community2 etc...

Data is like below:

Communities     AllM    AllP AnBr ArHe  ArFi    ArXa    AsGo
Community1      1   1   1   0   0   0   1
Community2      1   0   1   0   0   0   1
Community3      1   1   1   0   0   1   1
Community4      1   1   1   0   0   0   0
Community5      1   1   1   0   0   1   0
Community6      1   1   1   0   0   0   0
Control         1   0   1   0   0   0   1

Each rows represent different communities and the columns show species in the communities. 1 is presence of the species in the community, 0 is absence of the species in the community

My main goal is to create a Cluster plot using similarity index and I want to show which community is most similar with the control community in terms of species composition and which one is very different etc. How can I compute Jaccard similarity index for all possible duo combinations and create a matrix? or it would be great to create cluster plot to show similarity using this data in R, which is my preferred way of doing that. I am not sure the data is formatted correctly for the analysis.

or you can recommend best software and tools to do that easily.

Thanks

Vandka
  • 131
  • 1
  • 6
  • All possible combinations by what number of communities? Will you want, for example, combination KHRO1_A_110m KHRO1_B_10m KHRO1_B_110m? How are you defining Jaccard similarity between the three? – ttnphns Dec 10 '17 at 18:02
  • Of course there is no such way to calculate similarity among 3 or more groups. I meant, similarity index calculation between two communities and for all possible combinations of duo. There are 21 combinations, and I am looking to calculate them all in one go and create a cluster plot. Because manually it requires to repeat same process for 21 times to calculate. – Vandka Dec 11 '17 at 02:04
  • https://stats.stackexchange.com/q/49453/3277 – ttnphns Dec 11 '17 at 05:49

1 Answers1

1

I just found a very simple way to do that, after searching for a while. It calculates similarities for all duo combinations and creates matrix and also a hierarchical cluster plot with "ade" package in R.

install.packages("ade4")
library(ade4)
m <- read.table("./similarity/Data_pres&abs.tab", header=TRUE)
m1 <- read.table("./Similarity/Data_pres&abs_no_head.tab", header=F)
d <- dist.binary(m1, method = 1, diag = FALSE, upper = FALSE) #method 1 is Jaccard index (1901) S3 coefficient of Gower & Legendre
hc <- hclust(d)               # apply hierarchical clustering 
plot(hc, labels=m$ID)    # plot the dendrogram
Vandka
  • 131
  • 1
  • 6