Does the range of a variable make a difference to its weight in principle component analysis?

Question

I have a set of 6 statistics which I have measured on many simulations, and I have combined them using PCA to use just the two first principle components. But looking at the relative importance of each of the statistics in my PCs, some of them seem to have surprisingly low weights given that I know they separate my simulations very well.

To give an example, statistic A varies very nicely across my simulated parameter space with small standard deviations, but has weights of less than 0.1 in most of the principle components!

I have noticed that statistic A has a small dynamic range (varies between 0.01 and 0.2) while others have much bigger ranges (varying form 2 to 8, for example). Would this cause it to have a lower weighing in my analysis?

Yes! That is why in this case it is often advisable to standardize (z-score) all variables, making them all of similar scale. Please see http://stats.stackexchange.com/questions/53. Your Q is arguably a duplicate of that general and often-asked question. — amoeba, Aug 26 '16 at 12:49
@FJC, post your code and your results. Also clarify if you are doing PCA on covariance or correlation matrix. (If you don't know the difference, then you probably need to spend another day or so reading up on PCA.) Details of how the coefficients are calculated vary from one package to another, and may involve some combination of the scale of the original variable and the eigenvalues. — StasK, Aug 26 '16 at 13:42
Thanks for the help, this was obviously a terminology issue. I'm not a statistician so would not have a clue that the correlation and covariance matrix are significantly different or important in this case. I had searched this site for answers to my question before posting but didn't think to search for those terms. — FJC, Aug 26 '16 at 14:09

Does the range of a variable make a difference to its weight in principle component analysis?

0 Answers0