7

What is the difference between

  1. normalizing the variables and doing PCA;
  2. using scale=TRUE option (without normalizing the variables) in prcomp function in R?
amoeba
  • 93,463
  • 28
  • 275
  • 317
Nandha Kumar M
  • 73
  • 1
  • 1
  • 4
  • I erased your last sentence/paragraph because it was very hard to understand while your question is very clear already without it. – amoeba Mar 18 '17 at 15:59

2 Answers2

9

No difference. Type debug(prcomp) before running prcomp. The third line of the function reads: x <- scale(x, center = center, scale = scale.); ie. you will either scale within the function if you set scale = TRUE during function call or you will have the scaling done originally by you.

Having said that, when applying PCA in general it is a good idea to scale your variables. Otherwise the magnitude to certain variables dominates the associations between the variables in the sample. Unless all your variables are recorded in the same scale and/or the difference in variable magnitudes are of interest I would suggest you normalise your data prior to PCA. This issue has been revisited multiple time within CV eg. 1, 2, 3.

usεr11852
  • 33,608
  • 2
  • 75
  • 117
0

Using the correlation matrix is equivalent to standardizing each of the variables (to mean 0 and standard deviation 1). In general, PCA with and without standardizing will give different results. Especially when the scales are different.

scale=TRUE bases the PCA on the correlation matrix and FALSE on the covariance matrix

For example:

#my data
set.seed(1)
x<-rnorm(10,50,4)
y<-rnorm(10,50,7)
df<-data.frame(x,y) 

PCA based on covariance matrix and on Correlation matrix
PCA_df.cov <- prcomp(df, scale=FALSE)
PCA_df.corr <- prcomp(df, scale=TRUE)
  • The question asks what the difference is between scaling the data and using `scale=TRUE`. Your code example just shows that toggling `scale=TRUE` and `scale=FALSE` produces different results, which doesn't do much to explain why those results are different. I think the code example would be more clear if you used it to demonstrate that scaling the data and setting `scale=TRUE` produce the same result, and showing that is the same performing PCA on the covariance matrix. In other words, use code to demonstrate the claims you make in text. – Sycorax Mar 29 '21 at 15:48