I am trying to find the best way to represent genetic distances in a plane so that they may use them as response variables in canonical redundancy analysis (using rda()
in vegan
). While there are admittedly many genetic distances to choose from, none of them are Euclidean, which produces negative eigenvalues when we "summarize" them using PCoA (Principle Coordinate Analysis). As far as I know, there are 2 functions to correct negative eigenvalues: Lingoes and Cailliez. The following explanations on how they are computed are pulled out of the R
documentation for the pcoa()
function in the PCNM
package, but they are also well explained in ape
:
Lingoes: In the Lingoes (1971) procedure, a constant c1, equal to twice absolute value of the largest negative value of the original principal coordinate analysis, is added to each original squared distance in the distance matrix, except the diagonal values. A new principal coordinate analysis, performed on the modified distances, has at most (n-2) positive eigenvalues, at least 2 null eigenvalues, and no negative eigenvalue.
Cailliez: In the Cailliez (1983) procedure, a constant c2 is added to the original distances in the distance matrix, except the diagonal values. The calculation of c2 is described in Legendre and Legendre (1998). A new principal coordinate analysis, performed on the modified distances, has at most (n-2) positive eigenvalues, at least 2 null eigenvalues, and no negative eigenvalue.
During my explorations, I have not come across these transformations in any population/landscape genetics literature. Many authors use Fst-based genetic distances, and favor computing the square root of these distances before applying PCoA, and then removing any remaining negative eigenvalues (these typically account for <1% variance). I have also seen this for Dps (proportion of shared allele distance, or Bray-Curtis distance), but in community composition studies. In fact this is a procedure that is recommended in the CANOCO 5
manual (p.104).
While I would normally shrug this off after trying a few a defaulting to what the literature seems to favour, but I explored all of these transformations on a Fst-based genetic distance matrix and the downstream results vary considerably. Namely, the genetic variance explained by my models increases by an adjusted R2 of about 0.10 when I use the Cailliez transformation instead of computing sqrt()
prior to the PCoA. My instinct tells me not to go down that path, but I'm really uncertain why these different corrections yield such drastically different results, and why I should favor the square root correction.
Thanks for any insight you may have.