1

I am very new to PCA and I was trying, just as excercize, to reconstruct original dataset from loadings.

Let's suppose I have a matrix A corresponding to the original dataset and C that is the z-score of matrix A. What I want to do is to get back matrix A starting from loadings and C as A=C L' where L is the matrix of loadings (as described here: How to interpret PCA loadings?). So after doing the PCA I calculated the loading matrix L by multiplying each column of the eigenvectors matrix for its eigenvalue square root (i.e. first column by first sqrt eigenvalue, second column by second sqrt eigenvalue...). However when I calculated C L' I got a new matrix different from A.

Where do I go wrong? Tried to search around but could not find anything.

Mazzola
  • 21
  • 2
  • 4
  • (1) Why do you think that A should be equal to CL'? (2) Do you compute your PCA using A or using C, i.e. using covariances or using correlations? – amoeba Dec 01 '16 at 20:42
  • well I'm not sure why. Just tried to follow something I found here http://stats.stackexchange.com/questions/92499/how-to-interpret-pca-loadings however I'm computing correlations PCA. How do I construct original matrix from loading matrix? – Mazzola Dec 01 '16 at 20:56
  • Thanks, this link explains the confusion. Matrix C there is the matrix of component values (aka scores), it's not the matrix of z-scores of A! – amoeba Dec 01 '16 at 20:58
  • 1
    See here http://stats.stackexchange.com/questions/229092 about reconstructing. To reconstruct original matrix, you multiply PC scores with PC eigenvectors. Or, equivalently, you multiply z-scored PC scores with PC loadings. And there is no way you can reconstruct A from the results of PCA analysis that you did on C. – amoeba Dec 01 '16 at 20:59
  • I'm so sorry... Have read a lot about loadings and I'm still quite confused. I understand that loadings are the weights that multplies PCA components to obtain original dataset. But how can I verify this? – Mazzola Dec 01 '16 at 21:07
  • Do PCA on correlations (i.e. on your matrix C). Obtain matrix of loadings L as you described. Also obtain the matrix of scores S and standardize it, i.e. compute z-scores of PCA scores, let's call it Z. Now you will be able to check that C=ZL', i.e. you can reconstruct C by multiplying "components" with "loadings". – amoeba Dec 01 '16 at 21:09
  • 1
    OK, thanks. I verified what you said and I found out that computing YV' and Ys L' give the original dataset values of variables where Y is the matrix of PCA scores, V is the eigenvectors matrix, Ys is the z-score of PCA scores and L is the loading matrix. However I still don't understand why they give the same result. I mean how can I see this via matrix formulas? I tried to match those relations and obtained that Y V'= Y S L', so L=V S where S is diagonal matrix with std deviations of PC scores on diagonal. But I computed L as V S1 where S1 is the diagonal matrix with eigenval of C matrix – Mazzola Dec 02 '16 at 03:04
  • 1
    Ok, did again calculus and actually sqrt(eigenvalues) do match with deviation standard so the relation is trivial. Thanks a lot again. – Mazzola Dec 02 '16 at 03:19
  • Glad that you figured it out. – amoeba Dec 02 '16 at 11:32

1 Answers1

1

I think that @amoeba's answer is the best resource on the subject. I'll only say one thing that may help you understand it.

Mechanically PCA is just a matrix multiplication: $$Y=XW,$$ where $Y$ is your principal components, $X$ is input data, and $W$ is a matrix of coefficients.

The only trick here is that your matrix of coefficients is quite special: you can use it to recover the original matrix.

$$X=YW',$$ where $W'$ is the coefficients transposed. You can get all the details in the above mentioned answer.

Aksakal
  • 55,939
  • 5
  • 90
  • 176