Why a smaller covariance matrix is valid when calculating Eigenfaces?

Question

I am reading Eigenfaces for Recognition to understand the Eigenface approach but I can't follow the derivation of the PCA as presented on the paper.

In the algorithm it is necessary to do a PCA using a covariance matrix $AA^T$. However, $A$ is of size $N^2\times M$ where $N$ is the dimension of the image and $M$ is the number of images. This means $N^2$ is a very large number and finding the eigenvectors of $AA^T$ is therefore computationally intensive.

Then it is said that for PCA it's enough to find the eigenvectors of $A^TA$ instead of $AA^T$, and it turns out this is a lot easier because $M \ll N^2$ . The reasoning is the following:

Eigenvectors of $A^TA$ are the vectors $\vec{v}_i$ that for some number $\lambda_i$ it holds that

\begin{equation} A^TA\vec{v}_i = \lambda_i\vec{v}_i \end{equation}

and once we left multiply with $A$ we get

\begin{equation} AA^TA\vec{v}_i = \lambda_iA\vec{v}_i \end{equation} from which we see that $A\vec{v}_i$ are the eigenvectors of $AA^T$.

The authors then proceed to use these eigenvectors in the algorithm, but why can we use those? These are not the original eigenvectors $\vec{v}_i$ we were looking for, but $A\vec{v}_i$, different vectors of a different matrix.

So the question is: why can we use eigenvectors of $A^TA$ instead of $AA^T$? Is this something specific to the Eigenfaces algorithm?

The question is unclear. The idea is that you first find $v_i$ which is an eigenvector of $A^T A$. If you want to use it go ahead. If you want an eigenvector of $AA^T$, then you compute $Av_i$. That's it. There is no need for $A^{-1}$ here and no need to convert anything "back". — amoeba, Jan 03 '17 at 20:05
Could you explain what you mean by "left multiplying with $A^{-1}$", given that $A$ is non-square and therefore has no inverse? — whuber, Jan 03 '17 at 20:06
Thanks for the comments. I removed the part about inverse (I was obviously very confused) and added some details. — Pekka, Jan 03 '17 at 20:33
Either there's an important typographical error, or maybe it represents the source of confusion: the $v_i$ as you have defined them are eigenvectors of $A^\prime A$, not $AA^\prime$. Ultimately, I'm having a hard time seeing what the question is: you describe an efficient way to obtain eigenvectors of *both* $AA^\prime$ and $A^\prime A$, so where is the problem? — whuber, Jan 03 '17 at 20:35
Sorry, I had them mixed up. I fixed the typo in the question. — Pekka, Jan 03 '17 at 20:41
Indeed the mixup was the problem. Thanks for help, I'll try to formulate my own answer for this. — Pekka, Jan 03 '17 at 20:53
In the paper they just use the eigenvectors $\vec{v}_i$ instead of $A\vec{v}_i$, which I'm still a bit confused why it works at all. — Pekka, Jan 03 '17 at 21:13
Thanks @amoeba that thread was about the same question I was thinking about. — Pekka, Jan 03 '17 at 21:48

score 0 · Answer 1 · answered Jan 03 '17 at 21:07

The answer is already in the question and the confusion was caused by the mixup of the two matrices. In the paper it is shown that the eigenvectors $\vec{v}_i$ of the smaller matrix $A^TA$ are fast to compute, and then also the eigenvalues of the bigger matrix $AA^T$ can be recovered by multiplication $A\vec{v}_i$.

Why a smaller covariance matrix is valid when calculating Eigenfaces?

1 Answers1