How do I remove the first principal component from a data set, while keeping it in the original coordinates?

Question

I would like to remove the first principal component from a data set, but keep that data set in its original coordinates. I have taken a stab at this by taking PCA, zeroing the first PC, and then rotating back using the inverse of the eigvenvector matrix. Is that the most efficient way to do this?

create a sample data set:

set.seed(1234)
xx <- rnorm(500)
yy <- 0.5 * xx + rnorm(500, sd = 0.3)
vec <- cbind(xx, yy)
plot(vec, xlim = c(-4, 4), ylim = c(-4, 4))

Take principal components and zero out the first PC:

vv <- eigen(cov(vec))$vectors
newvec <- vec %*% vv
newvec[, 1] <- 0

Now rotate the new data set back to its original coordinates using the inverse of the PCA rotation matrix:

rvec <- newvec %*% t(vv) # transpose of orthogonal matrix = inverse
# plot new points in red and plot eigenvectors in green
points(rvec, col = "red")
arrows(0, 0, vv[1,1], vv[2, 1], col = "green3", lwd = 2)
arrows(0, 0, vv[1,2], vv[2, 2], col = "green3", lwd = 2)
legend("topleft", legend = c("original data", "data after extracting PC1", "eigenvectors of original data"), fill = c("black", "red", "green3"))

As you can see it seems to agree with the eigenvector orientations in green. But is this the correct and/or best way? Can I avoid the intermediate matrix multiplications for example?

+1 Well formulated and illustrated question. Your `vv` is an orthogonal matrix, hence its inverse equals its transpose. No need to compute the inverse. Answer: correct but not the best :) — amoeba, Aug 04 '16 at 11:33
Cool, but now there is nothing to answer anymore. The answer is Yes. — amoeba, Aug 04 '16 at 11:50
Just out of curiosity: after determining `vv`, shouldn't this also be possible in a single step instead of transforming all data to PC space and back? This would boil down to projecting data in the original space onto the subspace represented by the red line in the figure above, which is known after computing `vv`, and therefore shouldn't be too tricky, right? — geekoverdose, Aug 04 '16 at 12:43
@geekoverdose Yes, but this "single step" will involve projecting $X$ via $WW^\top$, i.e. computing $XWW^\top$, which is not that different from computing $XW$ and then multiplying with $W^\top$ (here $W$ denotes `vv`, or rather all columns of `vv` apart from the first). — amoeba, Aug 04 '16 at 13:02
Thomas, I ended up writing a separate question and answer on the topic of reconstructing original features from a subset principal components, trying to make it very general. I have now voted to close this Q as a duplicate. Please feel free to ask any questions; I will appreciate any feedback on how I can improve that thread (there are many questions here that I plan to close as duplicates of that one). Cheers. CC to @geekoverdose. — amoeba, Aug 12 '16 at 11:36

How do I remove the first principal component from a data set, while keeping it in the original coordinates?

0 Answers0