Find first Principal Component (and loading) using a fast iterative algorithm without covariance matrix

Question

I have a matrix $X$ and I would like to find its first principal component and the corresponding loadings. I would like to do this without computing the covariance matrix of $X$. How can I do so?

This is the standard version, which uses the eigendecomposition of the covariance matrix.

using LinearAlgebra: eigen
using Statistics: mean

function find_principal_component(X)
    n = size(X, 1)
    B = X .- mapslices(mean, X, dims=[1])     # Center columns of X
    evalues, V = eigen(B'B / (n - 1))         # EigenDecomposition of Covariance Matrix     
    PC = V[:, argmax(evalues)]                # Grab principal component and compute loading
    return B * PC, PC
end

Alternatively, one could use the power method, which still uses the covariance matrix

function power_method(X, niter=50)
    pc = randn(size(X, 2))
    pc /= norm(pc)
    M = X'X
    for i in 1:niter
        pc = M * pc
        pc /= norm(pc)
    end
    return X * pc, pc
end

I would like something like the power method, but without needing to compute the covariance matrix, which can be quite costly.

Possible solution

I noticed something interesting. Let $r_t$ be the principal component vector at time $t$. The idea of the power method is to start with a random $r_t$ and multiply it by $X^\top X$ many times to stretch it towards the principal component. In other words $$ r_{t+1} = X^\top X r_t $$ Once we have the principal component $r_t$ then the loadings are simply $$ \ell_t = X r_t $$ This means we can write $$ r_{t+1} = X^\top \ell_t $$ One could therefore start with $r_t$ and $\ell_t$ initialized randomly and then do $$ \begin{align} r_{t+1} &= \widehat{X^\top \ell_t} \\ \ell_{t+1} &= X r_{t+1} \end{align} $$

Why is your proposed solution an answer to your question? Your method involves computing a product with X’X at each iteration, which is a scalar multiple of the covariance. I thought you wanted to avoid forming the covariance matrix. — Sycorax, Dec 02 '21 at 17:21
@Sycorax my proposed solution is just an idea I had but it doesn't really fit my criteria, as you explained. I just noticed that one could do two matrix-vector multiplications rather than one large matrix-matrix multiplication — Physics_Student, Dec 02 '21 at 17:23
Just use truncated SVD: it already exists in Julia: https://github.com/JuliaLinearAlgebra/TSVD.jl — usεr11852, Dec 02 '21 at 17:37
@Physics_Student but certainly doing one matrix-vector product would be better than doing two, right? — Sycorax, Dec 02 '21 at 17:41
Literature on this should be around what is called "Lanczos Bidiagonalization" methods. You might want to invest time to reimplement it just so you understand what it does. — usεr11852, Dec 02 '21 at 17:48
@Sycorax yes, I was being an idiot. I guess I am just trying to find a way to do this that would be fast for a very large $X$ — Physics_Student, Dec 02 '21 at 17:55
Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions Halko, et al., 2009 (arXiv:909) https://arxiv.org/pdf/0909.4061.pdf — Sycorax, Dec 02 '21 at 18:01
+1. Fun paper. I don't remember much about it, but I remember having fun reading it. — usεr11852, Dec 02 '21 at 20:58

Sycorax · Answer 1 · 2021-12-02T17:32:01.903

Suppose $X$ is mean-centered (you have subtracted the mean of each column) with the columns storing the features and the rows storing the observations. PCA is the eigendecomposition of the covariance matrix $\Sigma = \frac{1}{n-1}X^T X$.

There is a deep relationship between PCA and SVD. In fact, you can use SVD to compute PCA. See: Relationship between SVD and PCA. How to use SVD to perform PCA?

SVD does not require forming $\Sigma$, and you can use power iteration to compute the singular vector to the largest singular value from $X$ directly. Note that this will work best if the largest singular value is much larger than all other singular values.

This is simply a naive approach using tools that you've already outlined. I'm sure there are better ones, perhaps exploiting specific knowledge about $X$.

Find first Principal Component (and loading) using a fast iterative algorithm without covariance matrix

Possible solution

1 Answers1