1

I am running a PCA on a dataset and I obtain a set of regressors. Now I would like to decompose another dataset onto the same basis of regressors and to know how much variance of the second dataset each of my regressors explains ?

amoeba
  • 93,463
  • 28
  • 275
  • 317
Alex
  • 11
  • 1

1 Answers1

1

Here are the steps

  • Get the transformation/rotation from your PCA
  • Apply the transformation to the second data set
  • Calculate the variance from transformed data, by column

Here is the code on a toy data set.

set.seed(0)
d1=scale(USArrests)
d2=scale(USArrests+matrix(runif(nrow(USArrests)*ncol(USArrests)),
                          ncol=ncol(USArrests))*10)

pr.out=prcomp(d1, scale=F)
pr.var=apply(pr.out$x,2,sd)^2
pve=pr.var/sum(pr.var)
plot(cumsum(pve), xlab="Principal Component",
     ylab="Cumulative Proportion of Variance Explained",
     ylim=c(0.5,1),type='b', lwd=2)
grid()

x2=as.matrix(d2) %*% pr.out$rotation
pr.var2=apply(x2,2,sd)^2
pve2=pr.var2/sum(pr.var2)
lines(cumsum(pve2),type='b',col=2,lwd=2)
legend(2, 0.7, c("Variance Explained in Data 1",
                   "Variance Explained in Data 2"), lwd=c(2,2), col=c(1,2))

The output plot is:

enter image description here

Note, the original data has 4 features, and if we do not reduce the dimension, and use all of them. We can always explain $100%$ of the variance in any data set. This is why you see, two curves meet at the top right corner.

Haitao Du
  • 32,885
  • 17
  • 118
  • 213
  • Great, thank you very much for your help. Actually I am using matlab, so I guess it should be something like that right ? `% svd decomposition ` `[u,s,v]=svd(data1);` `% rotation of the second dataset` `data2rot=inv(u)*data2*inv(v');` `% calculation of the variance` `var(data2rot)/sum(var(data2rot))` – Alex Sep 08 '16 at 20:26
  • I do not have the matlab code in my mind right now :). You can check the numbers by comparing 2 systems. BUT, one advice: never use `inv` function, because it is not numerical stable. – Haitao Du Sep 08 '16 at 20:45
  • More about `inv`: suppose you want to solve a linear system $Ax=b$, do NOT use `inv(A)*b`, but use `linsolve`. – Haitao Du Sep 08 '16 at 20:47
  • Thank you. Actually it seems to give me reasonable results on two dataset but wrong results if I re-apply it on the same dataset (the data2rot matrix being diagonal, the var calculation gives wrong results) – Alex Sep 08 '16 at 21:43
  • In fact I think I will try to learn how to import my data in r and use your code. Thank you so much – Alex Sep 08 '16 at 22:48
  • @Alex no upvote no acceptance :)? – Haitao Du Sep 09 '16 at 03:14