0

Suppose that I am given various samples of a vector random variable as the columns $v_1,v_2,\dots,v_n$ of a certain matrix $A$. Is there a relation between

  1. the SVD $A = USV^T$
  2. the SVD $\hat{A} = \hat{U}\hat{S}\hat{V}^T$, where $\hat{A}$ is the 'de-meaned' version of $A$, i.e. the matrix with columns $\hat{v}_j = v_j - \frac{1}{n}\sum_{k=1}^n v_k$.

If I understand correctly, when doing principal component analysis one typically uses the latter, because that is the matrix such that $\hat{A}\hat{A}^T$ is the covariance matrix of the $v_j$. Would it make a big difference to use the former instead? Is there an algebraic relation between the two sets of singular values and vectors obtained? Is one considered more meaningful than the other in practice?

Federico Poloni
  • 339
  • 2
  • 13
  • 1
    Centering brings in a big difference. PCA with centering maximizes SS deviations from the mean (i.e. variance); PCA on raw data maximizes SS deviations from the zero point. With real a dataset, it is impossible to re-calculate eigenvectors of one analysis onto those of the other. Please search the site for `PCA centering`, the topic was covered more than once here. – ttnphns Sep 24 '20 at 21:52
  • Start from https://stats.stackexchange.com/q/22329/3277 and https://stats.stackexchange.com/q/189822/3277 – ttnphns Sep 24 '20 at 21:55
  • Thanks! I was hoping for some more information (a quantitative treatment, or a discussion of how the principal components other than the first change), but that gives a basic answer (and if there is really no explicit relation I understand that it is difficult to tell more). – Federico Poloni Sep 25 '20 at 07:38
  • In any case, it seems to me that $\hat{A}$ is a compression of the matrix $A$, so there should be at least an interlacing inequality between the two sets of singular values. – Federico Poloni Sep 25 '20 at 07:39

0 Answers0