I am running a PCA on a dataset and I obtain a set of regressors. Now I would like to decompose another dataset onto the same basis of regressors and to know how much variance of the second dataset each of my regressors explains ?
Asked
Active
Viewed 431 times
1
-
What do you call `regressors` in that context? – ttnphns Sep 08 '16 at 17:31
-
I call regressors the eigenvectors identified by the pca – Alex Sep 08 '16 at 20:33
1 Answers
1
Here are the steps
- Get the transformation/rotation from your PCA
- Apply the transformation to the second data set
- Calculate the variance from transformed data, by column
Here is the code on a toy data set.
set.seed(0)
d1=scale(USArrests)
d2=scale(USArrests+matrix(runif(nrow(USArrests)*ncol(USArrests)),
ncol=ncol(USArrests))*10)
pr.out=prcomp(d1, scale=F)
pr.var=apply(pr.out$x,2,sd)^2
pve=pr.var/sum(pr.var)
plot(cumsum(pve), xlab="Principal Component",
ylab="Cumulative Proportion of Variance Explained",
ylim=c(0.5,1),type='b', lwd=2)
grid()
x2=as.matrix(d2) %*% pr.out$rotation
pr.var2=apply(x2,2,sd)^2
pve2=pr.var2/sum(pr.var2)
lines(cumsum(pve2),type='b',col=2,lwd=2)
legend(2, 0.7, c("Variance Explained in Data 1",
"Variance Explained in Data 2"), lwd=c(2,2), col=c(1,2))
The output plot is:
Note, the original data has 4 features, and if we do not reduce the dimension, and use all of them. We can always explain $100%$ of the variance in any data set. This is why you see, two curves meet at the top right corner.

Haitao Du
- 32,885
- 17
- 118
- 213
-
Great, thank you very much for your help. Actually I am using matlab, so I guess it should be something like that right ? `% svd decomposition ` `[u,s,v]=svd(data1);` `% rotation of the second dataset` `data2rot=inv(u)*data2*inv(v');` `% calculation of the variance` `var(data2rot)/sum(var(data2rot))` – Alex Sep 08 '16 at 20:26
-
I do not have the matlab code in my mind right now :). You can check the numbers by comparing 2 systems. BUT, one advice: never use `inv` function, because it is not numerical stable. – Haitao Du Sep 08 '16 at 20:45
-
More about `inv`: suppose you want to solve a linear system $Ax=b$, do NOT use `inv(A)*b`, but use `linsolve`. – Haitao Du Sep 08 '16 at 20:47
-
Thank you. Actually it seems to give me reasonable results on two dataset but wrong results if I re-apply it on the same dataset (the data2rot matrix being diagonal, the var calculation gives wrong results) – Alex Sep 08 '16 at 21:43
-
In fact I think I will try to learn how to import my data in r and use your code. Thank you so much – Alex Sep 08 '16 at 22:48
-