I am reading Eigenfaces for Recognition to understand the Eigenface approach but I can't follow the derivation of the PCA as presented on the paper.
In the algorithm it is necessary to do a PCA using a covariance matrix $AA^T$. However, $A$ is of size $N^2\times M$ where $N$ is the dimension of the image and $M$ is the number of images. This means $N^2$ is a very large number and finding the eigenvectors of $AA^T$ is therefore computationally intensive.
Then it is said that for PCA it's enough to find the eigenvectors of $A^TA$ instead of $AA^T$, and it turns out this is a lot easier because $M \ll N^2$ . The reasoning is the following:
Eigenvectors of $A^TA$ are the vectors $\vec{v}_i$ that for some number $\lambda_i$ it holds that
\begin{equation} A^TA\vec{v}_i = \lambda_i\vec{v}_i \end{equation}
and once we left multiply with $A$ we get
\begin{equation} AA^TA\vec{v}_i = \lambda_iA\vec{v}_i \end{equation} from which we see that $A\vec{v}_i$ are the eigenvectors of $AA^T$.
The authors then proceed to use these eigenvectors in the algorithm, but why can we use those? These are not the original eigenvectors $\vec{v}_i$ we were looking for, but $A\vec{v}_i$, different vectors of a different matrix.
So the question is: why can we use eigenvectors of $A^TA$ instead of $AA^T$? Is this something specific to the Eigenfaces algorithm?