1

This question is closely related to [Is it acceptable to reverse a sign of a principal component score?.

I have a matrix A with Samples in row and Features in column. I apply PCA on A and get a matrix of Samples in PC space (PCA_1). In this reduced space, I calculate correlation between samples.

Now, I re-do the exact similar process. As PC sign is arbitrary, I might get a different matrix of Samples in PC space (PCA_2 with an inverted sign in PC2). Nothing to worry about, the interpretation is the same. But now that I calculate the correlation between samples, it is quite different.

>PCA_1           PC1 PC2 PC3   >PCA_2          PC1 PC2 PC3
         Sample1   1   1  -2            Sample1   1  -1  -2
         Sample2   2  -2  -4            Sample2   2   2  -4
         Sample3   4  -3  -6            Sample3   4   3  -6

I created a reproducible example in R, creating example matrix in PC space with different signs : I didn't manage to reproduce pca with different signs on my computer, but I know it happens when running on 2 different computers)

PCA_1 = matrix(c(1,1,-2,2,-2,-4,4,-3,-6),byrow = T,nrow = 3,dimnames =  list(c("Sample1","Sample2","Sample3"),c("PC1","PC2","PC3")))
PCA_2 = matrix(c(1,-1,-2,2,2,-4,4,3,-6),byrow = T,nrow = 3,dimnames =  list(c("Sample1","Sample2","Sample3"),c("PC1","PC2","PC3")))

PCA_1
PCA_2

cor(t(PCA_1))
cor(t(PCA_2))

Which produces

> cor(t(PCA_1))
          Sample1   Sample2   Sample3
Sample1 1.0000000 0.7559289 0.7313071
Sample2 0.7559289 1.0000000 0.9993217
Sample3 0.7313071 0.9993217 1.0000000
> cor(t(PCA_2))
          Sample1   Sample2   Sample3
Sample1 1.0000000 0.7559289 0.8122396
Sample2 0.7559289 1.0000000 0.9958706
Sample3 0.8122396 0.9958706 1.0000000

The correlation between Sample1 & Sample3 are different. Why ? How to know which one is closer to the truth ?

Paquito
  • 11
  • 3

0 Answers0