4

As I understand it, Pearson PCA finds eigenvectors of the Pearson correlation matrix. The result is a coordinate system with dimensions that are linearly uncorrelated.

But, a Spearman PCA does this with the Spearman correlation matrix, and therefore the dimensions are not only linearly uncorrelated, but monotonically uncorrelated. At least to me, that seems much stronger.

Is my understanding wrong? Why is Pearson PCA so much more popular than Spearman PCA? And by popular, I mean that most libraries only ever provide the former.

I do understand that Pearson invented PCA, but that was more than 100 years ago.

rhombidodecahedron
  • 2,322
  • 3
  • 23
  • 37
  • 2
    One answer among several is that if you want Spearman PCA, you just need to calculate ranks first and then feed them to PCA. So, in terms of software provision, there is no point in separate routines (functions, commands, whatever), as whatever software allows PCA would typically provide rank calculations. A looser but more fundamental answer is that modelling predicting ranks is usually less interesting and less useful. Most of the time we want to treat variables as first measured without loss of information. Being tallest or 12th heaviest in a sample is not much information. – Nick Cox Dec 10 '15 at 09:58
  • 1
    PCA finds all the eigenvectors and eigenvalues (not just those with the largest eigenvalues). Whether you care about them all is another matter. There are some contexts in which PCs with the lowest eigenvalues are those to look at first. – Nick Cox Dec 10 '15 at 10:01
  • @NickCox good point, edited – rhombidodecahedron Dec 10 '15 at 10:05
  • @NickCox regarding the point about ranks, I suppose an example where Spearman would be preferable (and without information loss) is when the entities are on an equally spaced grid? – rhombidodecahedron Dec 10 '15 at 10:08
  • There is always information loss. If you tell me the ranks in that case, or any other, I can't tell you the original values. – Nick Cox Dec 10 '15 at 10:12
  • @NickCox If my values are (1,2,3,4) then the ranks are also (1,2,3,4) and there is no information loss. – rhombidodecahedron Dec 10 '15 at 10:15
  • 1
    Agreed: **if you look at the data too** then in some instances you can see that Pearson correlation would give the same answer. But you can't infer the data just from the ranks. Similarly you can't invert PCA results based on ranks to get back to the data. If someone tells you the answers to a test, you can get a perfect score. In itself, that doesn't gauge knowledge. – Nick Cox Dec 10 '15 at 10:24
  • Imagine two Spearman PCAs based on height and weight for for two sets of people. How do you compare them? – Nick Cox Dec 10 '15 at 10:29
  • 1
    Agree with all what's said already by @Nick. Linear PCA your Q is about is a linear transform of data and is also based on Pearson r which is also linear summarization of data. Spearman rho is like Pearson r, but after ranking of the data. The tie between the pre-rank data and the PCA results will be lost. As for Kendall tau, it is not [SSCP-type](http://stats.stackexchange.com/a/22520/3277) similarity at all and linear PCA would be weird to use with it. Linear PCA mathematically makes sense only with SSCP-type measures. – ttnphns Dec 10 '15 at 14:17
  • 1
    Questions similar to yours were already asked on this site, mostly about factor analysis ([such as](http://stats.stackexchange.com/q/141646/3277)), but they are relevant for PCA too. – ttnphns Dec 10 '15 at 14:19

0 Answers0