0

enter image description here

I have this question and I think I can just use maths for (ii) X'Xa=lambda a XX'Xa=XX'(Xa)=lambda(Xa) And so if lambda and a are eigenvalues/eigenvectors of X'X then lambda and Xa will be eigenvalues/eigenvectors of XX'

Not sure if this is enough though, but my main problem is that I can't figure out why it is important that X is a centered data matrix. How does that change the eigenvalues? For the first part of the question I would just divide lambda by (n-1) but that just seems too simplistic

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Jyn
  • 1
  • 3
  • Welcome to CV! You may want to add the self-study tag and read its wiki. (To the CV veterans, I'm voting leave open: While clearly textbook, the OP's specificity about X being centered is in keeping with the spirit and letter of the self-study tag.) – Sean Easter Apr 25 '17 at 13:43
  • I'm so sorry, did I break the rules? I'll read the wiki asap thanks! – Jyn Apr 25 '17 at 13:45
  • I don't believe you broke them. For self-study questions, we ask that posts outline a good faith attempt to solve the problem, and that they be specific about where the confusion lies. In my view you've done both of those things. – Sean Easter Apr 25 '17 at 13:47
  • 6
    It's just that if X is not centered then $X^\top X/(n-1)$ is not a covariance matrix (look up definition of covariance matrix - it includes "centering"). – amoeba Apr 25 '17 at 13:49
  • `How does the matrix being centered change the eigenvalues/eigenvectors?` The question title does not correspond to the question asked. Centering of columns of data X (n cases by p variables) does affect _values_ of the eigenvalues and eigenvectors; but the book excerpt is not about the values, it is about some basic properties or _rules_ of PCA. It tells the story that (first p) eigenvalues of X'X and of XX' are same. And that one can arrive from eigenvectors V of X'X to eigenvectors U of XX'. Which follows from the property of svd(X)=USV'. – ttnphns Apr 25 '17 at 15:18
  • (cont.) The thing is that US=XV: direct and indirect ways to compute raw principal component _scores_ of cases. And that VS'=X'U: direct and indirect ways to compute component _loadings_ of variables. (S are diagonal matrix of singular values; their squares are the eigenvalues mentioned above.) Please read first paragraphs in https://stats.stackexchange.com/a/141755/3277, as well as in _many_ other threads on this site (search terms `PCA svd loadings`). – ttnphns Apr 25 '17 at 15:18
  • @amoeba wrote "It's just that if X is not centered then $X^TX/(n−1)$ is not a covariance matrix (look up definition of covariance matrix - it includes "centering")." Huh? Any real matrix which is symmetric positive semidefinite is a covariance matrix ... of something. $X^TX/(n−1)$ is symmetric positive semidefinite, so it is a covariance matrix of something ... even if not of the thing you were thinking of. – Mark L. Stone Jan 27 '18 at 23:08

1 Answers1

4

In case you are still interested in some answer to your question title, a paper by Paul Henoeine (2014) (link: http://arxiv.org/abs/1407.2904v1) may be of some relevance. In particular, Lemma 1 and Theorem 3 of the paper give relationships in eigenvalues of $X'X$ between centered and non-centered $X$ matrix.

Let $K = X'X$ and $K_c$ = $X_c'X_c$ where $X$ is the nxp matrix and $X_c$ = $(I - \frac {1}{n}11')X$ (i.e., the centered counterpart). Note that $\frac {1}{n}1'X$ = ($\bar x_1$, $\bar x_2$, ..., $\bar x_p$), a row vector of sample means of the p variables, which is denoted as $\mu'$. Conventionally, the eigenvalues are ordered as a decreasing sequence.

Then, the eigen decompositions of $K$ and $K_c$ are $K = A\Lambda A'$ and $K_c = B\Lambda_c B'$, respectively. Note that $\Lambda$ = Diag{$\lambda_i$, i = 1, 2, ..., p }, the diagonal matrix of the eigenvalues of $K$ and $\Lambda_c$ = Diag{$\lambda_{ci}$, i = 1, 2, ..., p }, the diagonal matrix of the eigenvalues of $K_c$. Also, the columns of matrices A and B are the eigenvectors associated with the corresponding eigenvalues.

Applying Lemma 1 of Henoeine (2014), one has the following: $$\sum_{i=1}^p \lambda_{ci} = \sum_{i=1}^p \lambda_{i} - n\mu'\mu$$

Applying Theorem 3, one has the following interlacing property among the eigenvalues: $$\lambda_{cp} \le \lambda_p \le ... \le\lambda_{i+1} \le \lambda_{ci} \le \lambda_{i} \le ...\le \lambda_{2} \le \lambda_{c1} \le \lambda_{1}$$

T Lin
  • 111
  • 1
  • 5
  • This is interesting and may be very useful, so it would be nice to know at least a little about what those relationships are! Could you add a sentence or two about the nature of these relationships so that readers can easily decide whether to read the paper itself? – whuber Jan 28 '18 at 00:18
  • 1
    @whuber - appreciate your suggestion. More details are offered now. Hopefully, they are useful. – T Lin Jan 29 '18 at 02:35