I want to represent the co-occurrence of factor groups within clusters in a heat map that reflects the colocalization frequency of each pair of factors in clusters (yellow for more frequently colocalized, red means less). After trying different things, I came up with the code below. Is this a sensible way of representing this data set?
set.seed(1)
x = c(
paste("cluster-",sample(c(1:30000),800000, replace=TRUE),sep=""),
paste("cluster-",sample(c(1:300),100000, replace=TRUE),sep=""),
paste("cluster-",sample(c(600:900),100000, replace=TRUE),sep="")
)
y = c(
paste("factor-",sample(c(letters[1:19]),800000, replace=TRUE),sep=""),
paste("factor-",sample(c(letters[1:3]),100000, replace=TRUE),sep=""),
paste("factor-",sample(c(letters[6:9]),100000, replace=TRUE),sep="")
)
d = data.frame(x,y)
t = table(d)
dat = rbind(t[,])
dats = dat/apply(dat,1,sum)
cdats = cor(dats)
heatmap(cdats)
PS: also, I would like to add a correlation score legend of the bottom right corner that shows the colour code for correlation from 1 to 0. How can I do that?