What is the difference between
- normalizing the variables and doing PCA;
- using
scale=TRUE
option (without normalizing the variables) inprcomp
function in R?
What is the difference between
scale=TRUE
option (without normalizing the variables) in prcomp
function in R?No difference. Type debug(prcomp)
before running prcomp
. The third line of the function reads: x <- scale(x, center = center, scale = scale.)
; ie. you will either scale within the function if you set scale = TRUE
during function call or you will have the scaling done originally by you.
Having said that, when applying PCA in general it is a good idea to scale your variables. Otherwise the magnitude to certain variables dominates the associations between the variables in the sample. Unless all your variables are recorded in the same scale and/or the difference in variable magnitudes are of interest I would suggest you normalise your data prior to PCA. This issue has been revisited multiple time within CV eg. 1, 2, 3.
Using the correlation matrix is equivalent to standardizing each of the variables (to mean 0 and standard deviation 1). In general, PCA with and without standardizing will give different results. Especially when the scales are different.
scale=TRUE bases the PCA on the correlation matrix and FALSE on the covariance matrix
For example:
#my data
set.seed(1)
x<-rnorm(10,50,4)
y<-rnorm(10,50,7)
df<-data.frame(x,y)
PCA based on covariance matrix and on Correlation matrix
PCA_df.cov <- prcomp(df, scale=FALSE)
PCA_df.corr <- prcomp(df, scale=TRUE)