4

I wanted to use principal component analysis to create an index from two variables of ratio type. I am using the correlation matrix between them during the analysis. I want to use the first principal component scores as an index.

Higher values of one of these variables mean better condition while higher values of the other one mean worse condition. That is the lower values are better for the second variable. What I want is to create an index which will indicate the overall condition. Is there anything I should do before running PCA to get the first principal component scores in this situation?

Blain Waan
  • 3,345
  • 1
  • 30
  • 35
  • The first principal component resulting can be given whatever sign you prefer. The bigger deal is that the usefulness of the first PC depends very much on how far the two variables are linearly related, so that you could consider whether transformation of either or both variables makes things clearer. – Nick Cox Feb 29 '16 at 09:03
  • Thank you @NickCox. If I create an index from these variables (i.e. if they are correlated) and I want to use that index as a measure of overall condition, then how do I know if bigger values of the index indicate better condition or smaller values or the index indicate better condition? Since the two variables are going in opposite directions, I am a little confused. I was thinking if it'd be wise to multiply any of them by minus sign (I am not sure if that is at all necessary or would be a rather horrible idea). – Blain Waan Feb 29 '16 at 10:51
  • 2
    @Blain, if you care about the sign of your PC scores, you need to fix it *after* doing PCA. You can e.g. fix the sign so that it is the same as your variable 1 (this means: do PCA, check correlation of the PC with variable 1, if it is negative, flip the sign). However, can you tell us if you are going to standardize your variables (make them both unit variance) before running PCA or not? – amoeba Feb 29 '16 at 11:25
  • Yes, basically I'll analyze the correlation matrix, not the covariance matrix. Thanks for your comment @amoeba. – Blain Waan Feb 29 '16 at 11:45
  • 1
    I am asking because any correlation matrix of two variables has the same eigenvectors, see my answer here: http://stats.stackexchange.com/questions/140434. So you don't need to do bother with PCA, you can just flip the sign of one of your variables and average them. You will get exactly the same thing. – amoeba Feb 29 '16 at 11:56
  • 2
    @amoeba I think you might have overlooked the scaling that occurs in going from a covariance matrix to a correlation matrix. Your recipe works provided the *standardized* variables are being averaged, not the original variables themselves. – whuber Feb 29 '16 at 21:29
  • 1
    @whuber: Yes, averaging the standardized variables is indeed what I meant, just did not write it precise enough in a hurry. – amoeba Feb 29 '16 at 21:38

1 Answers1

5

What you call the "direction" of your variables can be thought of as a sign, because flipping the sign of any variable will flip its "direction". The signs of individual variables that go into PCA do not have any influence on the PCA outcome because the signs of PCA components themselves are arbitrary. See here: Does the sign of scores or of loadings in PCA or FA have a meaning? May I reverse the sign?

This means that if you care about the sign of your PC scores, you need to fix it after doing PCA.

This situation arises frequently. You can e.g. fix the sign of PC1 so that it corresponds to the sign of your variable 1. This means: do PCA, check the correlation of PC1 with variable 1 and if it is negative, flip the sign of PC1.


That said, note that you are planning to do PCA on the correlation matrix of only two variables. Any correlation matrix of two variables has the same eigenvectors, see my answer here: Does a correlation matrix of two variables always have the same eigenvectors? So in fact you do not need to bother with PCA; you can center and standardize ($z$-score) both variables, flip the sign of one of them and average the standardized variables ($z$-scores). You will get exactly the same thing as PC1 from the actual PCA.

amoeba
  • 93,463
  • 28
  • 275
  • 317
  • 2
    Thank you for this helpful answer. It was very informative. In fact I expressed the problem in a rather simple form, actually I have more than two variables. – Blain Waan Mar 05 '16 at 12:15