Distance matrix for cluster analysis of species counts in sequential time blocks?

Question

I have a couple straightforward questions and hope I can get some straightforward suggestions. I would like to perform a cluster analysis on species counts that are recorded in sequential time blocks so that I can group species into guilds or communities based on temporal co-occurrence (let's assume I have no useful prior information on these species). As you might have already guessed, pairwise euclidean distance is not getting the job done as it does not incorporate temporal similarity. So, I want to compute my own distance matrix to use in hierarchical clustering. Below is an example of a pair of species counts, but there are over 100 species.

What distance or similarity measure would be best suited to this situation? I'm just sort of throwing this one out there. Note there are a lot of 0's in the abundance matrix.
IF I were to use a chi-square distance:

0's end up in the denominator and a blank matrix is returned (this happens regardless of the software I use). I like the idea of the chi-square distance, so how can I get the software to assign a distance of 1 in the situation where 0 is in the denominator, instead of crashing out and returning a blank matrix? I am comfortable using R or SAS to accomplish this, so any little snippet of code to accomplish this simple task would be very helpful.

Any additional thoughts on the actual clustering method would also be helpful.

Welcome to [stats.se]! Please take a moment to view our [tour]. — Tavrock, Mar 14 '17 at 16:49
See [this answer](http://stats.stackexchange.com/a/173669/3277). It is all right to use chi-square distance. Empty columns are ignored in a `2 x p` count table. — ttnphns, Mar 14 '17 at 16:54
`pairwise euclidean distance is not getting the job done as it does not incorporate temporal similarity` That isn't very clear. You could use euclidean distance, only that chi-square (or phi-square) is better since they are for frequencies. — ttnphns, Mar 14 '17 at 16:56
I guess I should say that Euclidean distance is not making good clusters. It groups species into clusters based solely on counts, regardless of whether or not those two species overlap in time or not. The chi-square distance addresses this, but the software is not computing the distance matrix correctly because 0 ends up in the denominator when two species have no temporal overlap. — Patrick, Mar 15 '17 at 18:56

Distance matrix for cluster analysis of species counts in sequential time blocks?

0 Answers0