Appropriate negative eigenvalue correction for PCoA of genetic distances

Question

I am trying to find the best way to represent genetic distances in a plane so that they may use them as response variables in canonical redundancy analysis (using rda() in vegan). While there are admittedly many genetic distances to choose from, none of them are Euclidean, which produces negative eigenvalues when we "summarize" them using PCoA (Principle Coordinate Analysis). As far as I know, there are 2 functions to correct negative eigenvalues: Lingoes and Cailliez. The following explanations on how they are computed are pulled out of the R documentation for the pcoa() function in the PCNM package, but they are also well explained in ape:

Lingoes: In the Lingoes (1971) procedure, a constant c1, equal to twice absolute value of the largest negative value of the original principal coordinate analysis, is added to each original squared distance in the distance matrix, except the diagonal values. A new principal coordinate analysis, performed on the modified distances, has at most (n-2) positive eigenvalues, at least 2 null eigenvalues, and no negative eigenvalue.
Cailliez: In the Cailliez (1983) procedure, a constant c2 is added to the original distances in the distance matrix, except the diagonal values. The calculation of c2 is described in Legendre and Legendre (1998). A new principal coordinate analysis, performed on the modified distances, has at most (n-2) positive eigenvalues, at least 2 null eigenvalues, and no negative eigenvalue.

During my explorations, I have not come across these transformations in any population/landscape genetics literature. Many authors use Fst-based genetic distances, and favor computing the square root of these distances before applying PCoA, and then removing any remaining negative eigenvalues (these typically account for <1% variance). I have also seen this for Dps (proportion of shared allele distance, or Bray-Curtis distance), but in community composition studies. In fact this is a procedure that is recommended in the CANOCO 5 manual (p.104).

While I would normally shrug this off after trying a few a defaulting to what the literature seems to favour, but I explored all of these transformations on a Fst-based genetic distance matrix and the downstream results vary considerably. Namely, the genetic variance explained by my models increases by an adjusted R2 of about 0.10 when I use the Cailliez transformation instead of computing sqrt() prior to the PCoA. My instinct tells me not to go down that path, but I'm really uncertain why these different corrections yield such drastically different results, and why I should favor the square root correction.

Thanks for any insight you may have.

My own favourite strategy is to set to zero the negative eigenvalues and then proportionally correct the positive ones so that their sum = the initial sum of all eigenvalues = the trace of the doubly centered matrix of distances. I've tried 3 or 4 different ways and found that this one approximates the original distances by the final euclidean distances usually better than the other methods. — ttnphns, May 17 '14 at 07:18
Remark: The two methods you cite in your question can be a way out, surely (see e.g. [an example](http://stats.stackexchange.com/a/90901/3277)), and you have to redo the PCoA. The 4 methods that I tried I did _within_ PCoA inside the code of its function (which is easy because PCoA is very easy to code). — ttnphns, May 17 '14 at 07:30
One more notion. PCoA is done on _squared_ distances. I mean it takes the input dissimilarities as if squared euclidean distances, so that if the input are true squared euclidean distances, the analysis actually amounts to Principal component analysis. Some PCoA functions assume the input is already squared, some will square it for you. But in no case square root can be favoured (you write `favor computing the square root of these distances`). — ttnphns, May 17 '14 at 07:58
Xavier is talking about taking a square root of Fst (fixation index) based distances, that do not even satisfy the triangle inequality, so (not) taking the square root doesn't make it more or less suitable for PCoA. — Nik Tuzov, Jul 27 '17 at 21:23

Appropriate negative eigenvalue correction for PCoA of genetic distances

0 Answers0

Linked