Why is scaling the data needed before principal component analysis (PCA)

Question

I recognize that there are many questions on Cross Validated about scaling and PCA, but, after reading all of them, I still can't find the answer to my question. Several people have said that the question "PCA on correlation or covariance?" is a duplicate. However, it addresses why BOTH centering and scaling are needed, but it does not address each operation individually. Thus, my question is unique.

Why is scaling the data (often done by dividing by the standard deviation) needed before PCA?

The reason that I often find is the need to ensure that data from different units of measurement are standardized. However, it seems that CENTERING the data (subtracting each data vector by its mean) sufficiently addresses this issue.

Suppose that

X1 is in kilometres
X2 is in metres,
U1 = X1 - mean(X1)
U2 = X2 - mean(X2)

then the effect of using different units of measurement is removed, but raw deviations are retained.

In fact, it seems to be me that scaling by the standard deviation actually removes useful information. PCA seeks the data vectors that capture most of the VARIATION in the data set. If you divide each data vector by its standard deviation, then the standard deviation in each data vector is 1, and you just lost all of the variation that you sought to capture in the first place.

To those who think that scaling IS necessary before PCA, please tell me why I am wrong.

Thank you.

http://stats.stackexchange.com/questions/69157/why-do-we-need-to-normalize-data-before-analysis — user3494047, Mar 09 '17 at 15:41
centering the data changes nothing about how the data deviates from the mean. It will still deviate from the mean in the same way as before centering. It is just a constant shift of the data. — user3494047, Mar 09 '17 at 15:43
"If X1 is in kilometres, and X2 is in metres, then subtracting each by its mean ensures that each datum deviates from the mean by its standard deviation, not by its absolute deviation." - I don't understand what this is supposed to mean. — amoeba, Mar 09 '17 at 15:44
"then the standard deviation in each data vector is 1" does not mean the variance is 0. — user3494047, Mar 09 '17 at 15:45
user3494047 - EXACTLY! I think that we WANT the data to deviate in the same way after centering! If we divide by the standard deviation, then the manner of deviation changes, and that would result in a loss of information about the variation - we don't want this! Thus, I think that centering the data is enough, and I don't understand why scaling is necessary. — MSE, Mar 09 '17 at 19:53
That distinction is precisely the focus of the duplicate question. — whuber, Mar 09 '17 at 20:30
The question "PCA on correlation or covariance?" asks about BOTH centering and scaling, the result of which is the correlation matrix. I don't see any answer there that addresses why scaling is necessary. Why isn't centering enough? Why is scaling also necessary? That thread does not address this particular question. — MSE, Mar 09 '17 at 21:03
user3494047 - I never said that the variance is 0, so I don't understand what point you're trying make, let alone how it is relevant to my question. — MSE, Mar 09 '17 at 21:05
amoeba - I'm sorry, and you're right. That comment about X1 and X2 is confusing. I have edited my question to clarify it. — MSE, Mar 09 '17 at 21:05
To those who think that my question is a duplicate, PLEASE show me the link of the answer that answers my very question. I have read that thread very carefully, and I don't see an answer to my question. — MSE, Mar 09 '17 at 21:37
"PCA on correlation" means PCA after centering and scaling, "PCA on covariance" means PCA after centering only, hence "PCA on correlation or covariance" is **precisely** about the difference that scaling makes. — amoeba, Mar 10 '17 at 12:38
If you have X1 in km and X2 in m, and if you subtract the means, then "the effect of using different units of measurement" is definitely NOT removed, contrary to what you wrote in your edit. — amoeba, Mar 10 '17 at 12:40
amoeba - then please tell me what each of those 2 steps does. 1) What does subtracting the mean do, without dividing by the standard deviation? 2) What does dividing by the standard deviation do, without subtracting by the mean? — MSE, Mar 14 '17 at 13:28

Why is scaling the data needed before principal component analysis (PCA)

0 Answers0