If I use PCA before clustering, do I need to use PCA scores (principal components) to run the clustering?

Question

I want to use PCA before clustering, and then I want to run a clustering algorithm such as K-Means.

My understanding is that I run PCA and find loadings for each original variable, then calculate scores for each record with linear combinations of row values multiplied by each PC loadings, then run clustering on the calculated PCA scores.

Is it correct or do I need to do more before to run clustering on them?

sounds right to me. If using R, `prcomp(d)$x` has the rotated data. don't forget that the data is not scaled by default. — generic_user, May 19 '16 at 14:27

score 4 · Accepted Answer · answered May 21 '16 at 20:40

4

PCA decomposes the covariance matrix into rotation and scaling.

If you only use rotation, you should get the exact same result with k-means. So you gained nothing.

Two ways of using the scaling information:

scale every projected attribute to unit variance
discard attributes with low variance
both.

answered May 21 '16 at 20:40

Has QUIT--Anony-Mousse

39,639
7
61
96

I scale variables into normalized ones before I do PCA, but do I need to scale new values on PC axes into normalized ones again? The way I think is that I don't need to, because they are already zeron mean centered in PCA. Am I correct? – user122358 May 22 '16 at 06:40
You don't need to scale them prior to PCA. The results will be different though. After the rotation, you will keep the zero mean, but *not* the variances, so scaling *does* have an effect. – Has QUIT--Anony-Mousse May 22 '16 at 07:12
1

+1. I illustrated this in my answer here: http://stats.stackexchange.com/questions/230319. – amoeba Aug 17 '16 at 23:17

If I use PCA before clustering, do I need to use PCA scores (principal components) to run the clustering?

1 Answers1