When can I use $XX^\top/n$ as covariance matrix for PCA?

Question

Given a data matrix $\mathbf X$, can I always obtain its covariance matrix (to use in PCA) by centering (subtracting the column means) and then computing $\mathbf X \mathbf X^\top/n$? Is this always valid? Can I always center the data by subtracting the mean? Are there any restrictions when this cannot be applied?

I am working on quantum algorithms for principal component analysis (PCA). There exists a quantum algorithm which can calculate the PCA exponentially faster (at least in theory), but it works only if the covariance matrix is given in the Gram form, i.e. in the $\mathbf X\mathbf X^\top$ form. That is why I am wondering if there are any restrictions for this form?

The relationship between SVD and PCA is explained in this excellent post but it couldn't answer my question.

First of all, how do you define "Gram matix"? This very ambiguous, misused term sometimes mean here or there $X'X$ and sometimes $XX'$ and sometimes Gramian matrix in the sense "positive semidefinite" or some other sense... Second, in your first link I could not find any mentioning "Gram". Third, [this](http://stats.stackexchange.com/q/79043/3277) question addresses the issue of precision in the dilemma "svd or eigen". — ttnphns, May 06 '16 at 08:43
Forth, in a broad sense, PCA - does not require centering; centering is seen as a pre-processing step. But results and their interpretation [depend](http://stats.stackexchange.com/q/22329/3277) whether you center or not, of course. — ttnphns, May 06 '16 at 08:44
@ttnphns The principal components are defined as the eigenvectors of the covariance matrix. Subtracting the expected value is not optional when speaking of covariance. — cangrejo, May 06 '16 at 10:46
@broncoAbierto, `The principal components are defined as the eigenvectors of the covariance matrix`. Not necessarily covariance. Linear PCs can be defined based on any [SSCP-type](http://stats.stackexchange.com/a/22520/3277) matrix. — ttnphns, May 06 '16 at 15:19
@ttnphns It is definitely interesting to consider the eigendecomposition of any matrix of the form $A^TA$. Is it standard, though? If I recall correctly, in Jolliffe, Ian. Principal component analysis. John Wiley & Sons, Ltd, 2002. PCs are specifically defined for the covariance matrix, as their purpose is to maximize the variance of the matrix columns when projected onto the span of the resulting subspaces. — cangrejo, May 06 '16 at 15:31
@broncoAbierto, Let us not dispute about "standards" or ghosts. Btw, interestingly, Joliffe in his book discusses PCA on $X'X$, I remember to have seen it there. — ttnphns, May 06 '16 at 16:07
@ttnphns, thanks for pointing out the question which addresses partly the first question. Here different issues are mentioned (slower SVD vs. numerical stability of the SVD). I see that it's actually implementation dependent as well. Since I am working on quantum algorithms i suppose this matter can hence be dismissed since it depends on the procedure used. Concerning question 2, I was referring to the case $R=XX^{\top}$ (as used in the first link). I updated the question and removed question 1. Thanks! — LeoW., May 06 '16 at 19:55
I still don't understand what you are asking, @LeoW. What exactly do you mean by "Gram matrix"? And what is your remaining question? If you are asking about the effect of centering on PCA, then it's a duplicate, see one of ttnphns's links above. — amoeba, May 06 '16 at 20:17
Okay, so maybe let me clear that question a bit: @amoeba, since you pointed out in one of your posts that the covariance matrix is equal to $XX^{\top}$ for centered data only. And we can center the data by subtracting the mean (correct?). Are there any restrictions when this can be applied? Since I should always be able to calculate the mean and subtract it. Since there exists a quantum algorithm which can calculate the PC exponentially faster (at least the theory) which works if the covariance matrix is given in the Gram Form. I wonder if there are any restrictions from that Form? — LeoW., May 06 '16 at 21:36
No restrictions, you can always subtract the mean to get the "centered" data and then $XX^\top/n$ will equal the covariance matrix. — amoeba, May 06 '16 at 22:16
Great. However, your question remains very confusingly formulated. If my above comment is an answer to your question, then can your question be reformulated as "Given a data matrix X, can I always obtain its covariance matrix by subtracting the column means and computing XX'/n? Is this always valid?". Is this what you wanted to ask? — amoeba, May 07 '16 at 21:19
@amoeba: Reading the question as you formulated it, yes. Makes it probably way clearer :) — LeoW., May 08 '16 at 16:08
I edited your question (and upvoted). Please check that everything makes sense. — amoeba, May 11 '16 at 12:56
@amoeba I think the comments provided are not fulfilling. Yes you can calculate a covariance matrix for a data set as an arithmetic operation. The issue is I think whether doing this is meaningful in a non-multivariate normal distributional setting. — Cagdas Ozgenc, May 11 '16 at 13:58
@CagdasOzgenc: I don't think *that* was the issue raised by the OP. But I might be wrong. The question is now heavily edited, but you can take a look at the original formulation in the edit history. It was very unclear, at least to me. — amoeba, May 11 '16 at 14:00
@amoeba He asked if there are any restrictions. I would say if the distribution is fat tailed (or in extreme case like Cauchy) then the ad-hoc covariance matrix procedures will turn out to be ad-hoc useless. — Cagdas Ozgenc, May 11 '16 at 14:06
@CagdasOzgenc: That is fair enough. I think this would be an answer to the question "are there any restrictions to using sample covariance matrix in PCA?". But the answer to the question "are there any restrictions to using XX'/n with centered X as sample covariance matrix?" is simply "no". In any case, I encourage you to post an answer here elaborating on all that. — amoeba, May 11 '16 at 14:08
@amoeba Actually vice versa. There are restrictions to use XX'/n as a covariance matrix, those that I mentioned. — Cagdas Ozgenc, May 11 '16 at 14:49
@Cagdas You mean that sample covariance matrix will be a bad estimator of the population covariance matrix? Can be, yes. But *sample covariance matrix* is just XX'/n by definition. — amoeba, May 11 '16 at 16:08
Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/39633/discussion-between-cagdas-ozgenc-and-amoeba). — Cagdas Ozgenc, May 11 '16 at 16:25

score 2 · Accepted Answer · answered May 12 '19 at 12:31

This is the difference between centered (or demeaned) and uncentered PCA. One recent (2016) paper that among others discuss this is Principal component analysis:a review and recent developments by Ian T. Jolliffe and Jorge Cadima, which can be found here. I will just cite what they say:

As was seen in §2, PCA amounts to an SVD of a column-centred data matrix. In some applications [see below], centring the columns of the data matrix may be considered inappropriate. In such situations, it may be preferred to avoid any pre-processing of the data and to subject the uncentred data matrix to an SVD or, equivalently, to carry out the eigendecomposition of the matrix of noncentred second moments, T, whose eigenvectors define linear combinations of the uncentred variables. This is often referred to as an uncentred PCA and there has been an unfortunate tendency in some fields to equate the name SVD only with this uncentred version of PCA. Uncentred PCs are linear combinations of the uncentred variables which successively maximize non-central second moments, subject to having their crossed non-central second moments equal to zero. Except when the vector of column means x¯ (i.e. the centre of gravity of the original n-point scatterplot in p-dimensional space) is near zero (in which case centred and uncentred moments are similar), it is not immediately intuitive that there should be similarities between both variants of PCA. Cadima & Jolliffe ON RELATIONSHIPS BETWEEN UNCENTRED AND COLUMN-CENTRED PRINCIPAL COMPONENT ANALYSIS have explored the relations between the standard (column-centred) PCA and uncentred PCA and found them to be closer than might be expected, in particular when the size of vector $\bar{x}$ is large. It is often the case that there are great similarities between many eigenvectors and (absolute) eigenvalues of the covariance matrix S and the corresponding matrix of non-centred second moments, T. In some applications, row centrings, or both row- and column-centring (known as doublecentring) of the data matrix, have been considered appropriate. The SVDs of such matrices give rise to row-centred and doubly centred PCA, respectively.

One case when centering often is unnatural and doesn't help is for images. One example on this site is What is the intuition behind SVD?.

When can I use $XX^\top/n$ as covariance matrix for PCA?

1 Answers1