0

I'm used to posting on stack overflow for code questions, so not sure about the rules here, but I have some trouble with a Principal components analysis, and I don't know where to find help. either I did something wrong or there is a bug in the program I'm using.

As far as I understand, one of the reasons to use PCA is to reduce collinearity - in other words, components should be less correlated than the raw variables (specifically, they should not be correlated, the correlation coefficient should be around 0)

I ran the PCA in a stats program (JASP), using varimax orthogonal rotation (the variables are centered automatically), and it gave me 6 components, and said the correlation between each of them was 0 (this was also an automatic output so I don't know how it was obtained), as it should be.

However, if I copy paste the values of each component and correlate them manually/run a zero-order correlation between component 1&2, 2&3 etc., the correlation is not 0, in fact the correlation between component 1&2 is 0.5 (this is the case using correlation matrix option in JASP, cor function in R and correl function in Excel). Am I correct in thinking that there are issues with the PCA, and the components are not in fact uncorrelated for some strange reason I have yet to discover?

I haven't posted here much so please let me know if I need to provide more detail, I'm so confused that I don't even know what I'm looking for exactly, but I can't use the components when they are correlated so don't know how to keep doing my analysis...

EDIT: added some detailes in brackets

Questioning
  • 101
  • 2
  • This sounds like the issue extensively discussed at https://stats.stackexchange.com/questions/139047. Does that thread help? (If it isn't clear that SVD is basically PCA, see https://stats.stackexchange.com/questions/134282.) Regardless, you need to provide more details about the choices you made in performing PCA, especially concerning whether you centered your variables first and concerning how you computed correlations between the components. – whuber Jan 26 '21 at 22:50
  • thanks, I added a few details and will read the thread. So I guess I'm not going crazy, just unlucky – Questioning Jan 26 '21 at 23:09
  • I think there are two issues. Imagine that you were only dealing with two dimensions as an extreme case: the calculated numerical correlation of $(a,b)$ and $(c,d)$ will be $+1$ or $-1$ unless $a=b$ or $c=d$. That is before you do anything sophisticated. Then suppose you have a lot of 2D data, and you run PCA: the two components you get will be orthogonal (their dot product will be zero) but one will exactly determine the other up to sign so they are not in any sense independent. Using more dimensions will make both these effects less extreme, but not make them go away. – Henry Jan 26 '21 at 23:43
  • @Henry, do you mean that the components should be less correlated than the raw variables though? because they aren't actually, which is why I'm confused. Also, not sure what the 2nd issue is? – Questioning Jan 26 '21 at 23:50
  • The second issue is that if you tell me one component from the PCA is for example $(0.8,-0.6)$ then I know the other component must be either $(0.6,0.8)$ or $(-0.6,0.8)$ since those are the only orthogonal unit vectors, so there is total dependence in the 2D case and partial dependence in higher dimensions – Henry Jan 27 '21 at 00:12

0 Answers0