1

I have two questions related to the PCA:

  1. Is the higher weight assigned to the variable which not necessarily have higher variation but are more representative of the variation of other variables? (i.e. the PCA aims to catch the common variation)
  2. Since the input of PCA is just the correlation matrix, which is not affected by the scale, is standardization required by PCA?

Below is a toy dataset I've created to help in explaining my doubts. Please correct me if there is anything incorrect. Thanks so much!

set.seed(99)
a = sample(1:1000,300,replace=TRUE)
b = sample(1:1000,300)

#standardize a and b
a_t = (a-mean(a))/sd(a)
b_t = (b-mean(b))/sd(b)
sd(a)
sd(b)
#the variation of the two variables are different
x=as.matrix(cbind(a,b))
xcor = cor(x)

xcor
x_t=as.matrix(cbind(a_t,b_t))
xcor_t = cor(x_t)

xcor_t
#the correlation matrix is not affected by the standardization
# Eigen decomposition 
out = eigen(xcor)  
va = out$values  
ve = out$vectors       

ve
#the two variables are assigned the same weight (absolute value)
# Eigen decomposition 
out_t = eigen(xcor_t)  
va_t = out_t$values  
ve_t = out_t$vectors   

ve
#the standardized variables share the same eigenvector are the unstandardized variables
```
Yuan
  • 473
  • 2
  • 6
  • 2
    (The key idea connecting this post to your post is that PCA when the data are standardized by standard deviation is PCA of the *correlation* matrix, rather than the more common PCA of a *covariance* matrix.) – Sycorax Feb 03 '20 at 21:24
  • Got it! thank you so much! – Yuan Feb 03 '20 at 21:29

0 Answers0