How to intuitively understand the reconstruction of the original data after dimension reduction

Question

So I have been trying to understand PCA for the past day, and the part that I don't understand is when the original data is reconstructed after dimension reduction. Below is the code that I was follow from here. I what the code is doing for the most part, but the one line that I don't understand is this line:

Xt_reconstructed <- Xt_projected %*% t(res$rotation[,1:pc.use])

 # Generate data


m=50 # columns or features
n=100 # rows or observations
frac.gaps <- 0.5 # the fraction of data with NaNs
N.S.ratio <- 0.25 # the Noise to Signal ratio for adding noise to data


x <- (seq(m)*2*pi)/m
t <- (seq(n)*2*pi)/n


#True field
Xt <- 
 outer(sin(x), sin(t)) +
 outer(sin(2.1*x), sin(2.1*t)) +
 outer(sin(3.1*x), sin(3.1*t)) +
 outer(tanh(x), cos(t)) +
 outer(tanh(2*x), cos(2.1*t)) +
 outer(tanh(4*x), cos(0.1*t)) +
 outer(tanh(2.4*x), cos(1.1*t)) +
 tanh(outer(x, t, FUN="+")) +
 tanh(outer(x, 2*t, FUN="+"))


Xt <- t(Xt) # data matrix nxm
# PCA
res <- prcomp(Xt, center = TRUE, scale = FALSE)
names(res)


#plot(cumsum(res$sdev^2/sum(res$sdev^2))) 
#cumulative explained variance


########################################
# reconstruction of original data from lower number of features 
######################################
pc.use <- 3 # num of principal components to use
Xt_projected <- res$x[,1:pc.use] # projection of original data onto PCs
Xt_reconstructed <- Xt_projected %*% t(res$rotation[,1:pc.use])


# add the center (and re-scale) back to original data
if(res$scale != FALSE){
 Xt_reconstructed <- scale(Xt_reconstructed, center = FALSE , scale=1/res$scale)
 }
if(res$center != FALSE){
 Xt_reconstructed <- scale(Xt_reconstructed, center = -1 * res$center, scale=FALSE)
 }
 dim(Xt_reconstructed); dim(Xt)

So if my understanding is correct Xt_projected is the Xt projected onto 3 dimension space, where each axis of the space is defined by the eigenvectors. Now to reconstruct the data to the original m = 50 dimensions, why are we multiplying Xt_projected by the transpose of the eigenvectors used to map the Xt to the 3 dimension space, am I missing something here?

I'm guessing that when Xt_projected is multiplied by the transpose of the the eigenvectors that projected Xt to Xt_projected it is effectively "reconstructing" the original data. But this is just a guess and even if it is correct I still don't intuitively understand what is happening when the projected data is multiplied by the transpose of the eigenvectors that projected them there.

Sorry if this is unclear, please tell me so that I can edit it accordingly.

score 2 · Accepted Answer · edited Apr 13 '17 at 12:44

So the advantage of coding directly is that you can "see that it works", the disadvantage is that if you haven't done the maths on the side you might get confused (as you apparently are now) so allow me to do some maths:

You have an initial data matrix which we will call $M$ (the notations in the code are particularly unhelpful). This is your Xt which is of size $100\times 50$. Now $M$ can be written as:

$ M = U\Lambda V^T$

this is the SVD decomposition (such a decomposition always exists), $U$ is of size $100\times 50$, $\Lambda$ is a diagonal matrix of size $50\times 50$ and $V$ a squared, orthonormal ($V^TV =I)$, matrix of size $50\times 50$.

To connect with the R code, res contains the results from the decomposition with res$x being $U\Lambda $ and res$rotation being $V$.

You can check this in your code by doing

res <- prcomp(Xt,center=FALSE,scale=FALSE)
norm(Xt -  res$x %*% t(res$rotation))

which should give you something extremely small.

Now PCA essentially amounts to truncating a bunch of singular values (just ignoring them). That is, you write a reconstruction

$\hat M = U \hat \Lambda V^T$

where $\hat \Lambda$ is the same $\Lambda$ but with a zero diagonal apart from the first few largest singular values (corresponding to the number of components you're considering). So in your case, keeping only three elements

$\hat\Lambda = \left(\begin{array}{ccccc} a & & & &\\ & b& & &\\ & & c & &\\ & & & 0 &\\ & & & & \ddots\end{array}\right)$

but multiplying this matrix to $V^T$, means really you should only care about the first three lines of $V^T$ in the reconstruction (the rest will be multiplied by zero), and similarly, the first three columns of $U$

the PCA uses slightly different notations but the principle is exactly the same, the res$x is, as said before, $U\Lambda$ so taking res$x[,1:pc.use] exactly amounts to $U\hat\Lambda$. And similarly, the first three rows of $V^T$ are the transpose of the first three columns of $V$ whence the t(res$rotation[,1:pc.use])

the $\hat M$ from above is then just your Xt_reconstructed:

Xt_reconstructed <- res$[,1:pc.use] %*% t(res$rotation[,1:pc.use])

I'm hoping by now all the operations make sense.

What I'd stress given your question is that you're not reconstructing the original data ($M\neq \hat M$, obviously, unless you're taking all the components which is silly). There may be confusion due to this. What you're doing is reconstructing in the original space. That's it. But effectively all you care about is that you can now store a "pretty good" approximation to $M$ by just storing the first three columns of $U\Lambda$ (your res$x[,1:pc.use]) and the first three columns of $V$ (your res$rotation[,1:pc.use]), then whenever you actually need to consider what it looks like (in order, for example, to visualize it) then you compute the product and you end up with a representation which has the same dimension than the original object (and therefore is comparable to it) but has effectively less information in it.

More info about the SVD: https://en.wikipedia.org/wiki/Singular_value_decomposition

More info about PCA versus SVD: Relationship between SVD and PCA. How to use SVD to perform PCA?

I think the M^{hat} really cleared it up for me, and also your last paragraph! Thanks for you answer! — YellowPillow, Sep 09 '16 at 16:29

How to intuitively understand the reconstruction of the original data after dimension reduction

1 Answers1