This question is closely related to [Is it acceptable to reverse a sign of a principal component score?.
I have a matrix A with Samples in row and Features in column. I apply PCA on A and get a matrix of Samples in PC space (PCA_1). In this reduced space, I calculate correlation between samples.
Now, I re-do the exact similar process. As PC sign is arbitrary, I might get a different matrix of Samples in PC space (PCA_2 with an inverted sign in PC2). Nothing to worry about, the interpretation is the same. But now that I calculate the correlation between samples, it is quite different.
>PCA_1 PC1 PC2 PC3 >PCA_2 PC1 PC2 PC3
Sample1 1 1 -2 Sample1 1 -1 -2
Sample2 2 -2 -4 Sample2 2 2 -4
Sample3 4 -3 -6 Sample3 4 3 -6
I created a reproducible example in R, creating example matrix in PC space with different signs : I didn't manage to reproduce pca with different signs on my computer, but I know it happens when running on 2 different computers)
PCA_1 = matrix(c(1,1,-2,2,-2,-4,4,-3,-6),byrow = T,nrow = 3,dimnames = list(c("Sample1","Sample2","Sample3"),c("PC1","PC2","PC3")))
PCA_2 = matrix(c(1,-1,-2,2,2,-4,4,3,-6),byrow = T,nrow = 3,dimnames = list(c("Sample1","Sample2","Sample3"),c("PC1","PC2","PC3")))
PCA_1
PCA_2
cor(t(PCA_1))
cor(t(PCA_2))
Which produces
> cor(t(PCA_1))
Sample1 Sample2 Sample3
Sample1 1.0000000 0.7559289 0.7313071
Sample2 0.7559289 1.0000000 0.9993217
Sample3 0.7313071 0.9993217 1.0000000
> cor(t(PCA_2))
Sample1 Sample2 Sample3
Sample1 1.0000000 0.7559289 0.8122396
Sample2 0.7559289 1.0000000 0.9958706
Sample3 0.8122396 0.9958706 1.0000000
The correlation between Sample1 & Sample3 are different. Why ? How to know which one is closer to the truth ?