In many receptor-modeling studies, after performing the PCA analysis, they often "rescale" their varimax-rotated PC scores (which are standardized with mean zero and standard deviaiton of 1) to something called an absolute principal component scores (APCS) before performing the MLR so that you can estimate the source contributions from each factor in terms of your independent variable.
This is done by:
- calculate the z-score for absolute zero concentrations (i.e. take a vector with all zeroes, subtract the sample mean and divide by the sample variance);
- calculate the rotated PC scores for each component for this z-scored absolute zero from step 1;
- subtract the "zero" PC score (from 2) from the true scores.
I tried to follow the procedure but still had negative values... Can someone take a look and see where did I go wrong? Here is the code using the iris data set (most of the code originated from a previous discussion):
irisX <- iris[,1:4]
ncomp <- 2
pca_iris <- prcomp(irisX , center=T, scale=T)
rawLoadings <- pca_iris$rotation[,1:ncomp] %*% diag(pca_iris$sdev, ncomp, ncomp)
rotatedLoadings <- varimax(rawLoadings)$loadings
invLoadings <- t(pracma::pinv(rotatedLoadings))
scores <- scale(irisX) %*% invLoadings # my scores from rotated loadings which are standardized
# want to use APCS to do MLR instead of these scores
#step 1: create artificial sample with zero concentrations for all variables
z0i <- matrix(-colMeans(irisX)/sqrt(apply(irisX, 2, var)), nrow = 1)
#step 2: find its rotated PC score by multiplying the transposed rotated loading from the original sample
scores0 <- as.numeric(z0i %*% invLoadings) # my absolute zero PC scores (supposedly...)
#step 3: now to calculate my new "APCS"
scores0 <- matrix(rep((scores0), each = nrow(scores)),nrow = nrow(scores))
ACPS <- scores - scores0
This results in
> head(ACPS)
[,1] [,2]
[1,] 4.274291 -9.339044
[2,] 3.980231 -8.167430
[3,] 3.937934 -8.548838
[4,] 3.886160 -8.284854
[5,] 4.262470 -9.527271
[6,] 4.724111 -10.296421
and one can see that there are still negative values in the ACPS data. Why?