I am learning now about the PCA and ZCA applications for the machine learning problems of classification and clustering. I would like to apply PCA and ZCA mostly, but not only, to image data. From what I understand, if we have a data matrix $X$ with dimensions $(n,m)$, $n=$ number of features and $m=$ number of samples, then we can calculate the covariance matrix as $\Sigma_1 = XX^T$ if we want to reduce correlations of the features and $\Sigma_2=X^TX$ if we want to reduce correlations of the samples.
My question: is there a rule of thumb to check if in a given case it makes more sense to use $\Sigma_1$ or $\Sigma_2$?
I arrived at asking this question after I figured out that calculating the SVD of $\Sigma_1$, with $\dim=(n,n)$, is not possible on my computer if n>4000, what corresponds to not using colour images with more than 32 pixels (32*32*3 colour channels $\approx$ 4000). But then, if $m<n$, let's say $m\approx 1000$, I could much more quickly calculate $\Sigma_2$ then $\Sigma_1$. Additional questions could be: What caveats do you see in my idea? Is there an easy way to speed up the SVD of $\Sigma_1$ with some python package?