1

PCA method finds the covariance between data vectors, where each data vector includes observations of different variables (dimensions). So if the data matrix has variables in columns and observations in rows, then we find the covariance of rows. Right? Does it make sense to do PCA by finding the covariance between different dimensions instead? In the above dataset representation, it means to find covraince between columns. I am asking this, because I found in some standard libraries, the algorithm performs the latter instead.

Avestan
  • 111
  • 2
  • It is standard to perform PCA on the variables (typically columns, as you note). It is possible, but very unusual, to perform it on observations / rows. – gung - Reinstate Monica Dec 27 '15 at 02:47
  • Are the results similar? Say the data is $N\times D$ and the feature space is $d$-dimensional! Using the standard way of PCA, gives a $N\times N$ covariance matrix and the final mapping matrix should be multiplied by the transpose of the input dataset to give a mapping $D\times d$ which is desirable. The unusual approach directly gives the $D\times d$ mapping matrix! – Avestan Dec 27 '15 at 02:59
  • The *standard* use of PCA is to take an $N\times D$ data matrix and input a $D\times D$ covariance matrix. Your eigenvectors are of length $D$. Etc. – gung - Reinstate Monica Dec 27 '15 at 03:17
  • 1
    @usεr11852 Thank you so much. The link you sent me is very clear. Sorry for sending a duplicate! – Avestan Dec 27 '15 at 04:29
  • 2
    No biggie, I am glad I could help. That's what the community is about. – usεr11852 Dec 27 '15 at 11:38

0 Answers0