Relationship between eigenvectors of $\frac{1}{N}XX^\top$ and $\frac{1}{N}X^\top X$ in the context of PCA

Question

In Christopher Bishop's book Pattern Recognition and Machine Learning, the section on PCA contains the following:

Given a centred data matrix $\mathbf{X}$ with covariance matrix $N^{-1}\mathbf{X}^T\mathbf{X}$, the eigenvector equation is:

$$N^{-1}\mathbf{X}^T\mathbf{X} \mathbf{u}_i = \lambda_i \mathbf{u}_i.$$

Defining $\mathbf{v}_i = \mathbf{X} \mathbf{u}_i$, Bishop claims that if $\mathbf{u}_i$ and $\mathbf{v}_i$ have unit length, then:

$$\mathbf{u}_i = \frac{1}{(N\lambda_i)^{\frac{1}{2}}}\mathbf{X}^T\mathbf{v}_i.$$

Where does the square root come from?

EDIT:

In particular, why is the following invalid:

$\frac{1}{N}\mathbf{X}^T\mathbf{X}\mathbf{u}_i = \lambda \mathbf{u}_i$

$\Rightarrow \frac{1}{N}\mathbf{X}^T \mathbf{v}_i = \lambda \mathbf{u}_i$ using $\mathbf{v}_i = \mathbf{Xu}_i$

$\Rightarrow \frac{1}{N\lambda_i}\mathbf{X}^T \mathbf{v}_i = \mathbf{u}_i$

The same result, but without the square root.

The square root comes from the requirement of PCA that the covariance matrix of the transformed data be the identity matrix. Without it you would be doing $VV^T x = Ix = x$ and you wouldn't have gotten anywhere. — conjectures, Jan 25 '16 at 14:56
@amoeba I have expanded my question to contain the exact problem I am having, I will also include more context if necessary. — Danny, Jan 25 '16 at 16:34
"...using $v_i=Xu_i$" is not justified by the defining equation $v_iX=u_i$ and is usually not true. — whuber, Jan 25 '16 at 17:48
@whuber Whoops. that was a typo, the book does define $v_i = Xu_i$. Thanks — Danny, Jan 25 '16 at 18:07

amoeba · Accepted Answer · 2016-01-25T22:48:49.570

This refers to the short section 12.1.4 PCA for high-dimensional data in Bishop's book. I can see that this section can be a bit confusing, because Bishop is going back and forth between $\newcommand{\X}{\mathbf X}\newcommand{\v}{\mathbf v}\newcommand{\u}{\mathbf u}\v_i$ and $\u_i$ using a slightly inconsistent notation.

The section is about the relationship between the eigenvectors of covariance matrix $\frac{1}{N}\X^\top \X$ and the eigenvectors of the Gram matrix $\frac{1}{N}\X \X^\top$ (in the context of PCA). Let $\v_i$ be a unit-length eigenvector of $\frac{1}{N}\X \X^\top$:

$$\frac{1}{N}\X \X^\top \v_i = \lambda_i \v_i.$$

If we multiply this equation by $\X^\top$ from the left:

$$\frac{1}{N}\X^\top \X (\X^\top \v_i) = \lambda_i (\X^\top \v_i),$$

we see that $\X^\top \v_i$ is an eigenvector of $\frac{1}{N}\X^\top \X$.

However, it will not have unit length! Indeed, let us compute its length: $$\|\X^\top \v_i\|^2=(\X^\top \v_i)^\top \X^\top \v_i = \v_i^\top \X\X^\top \v_i=\v_i(N\lambda v_i)=N\lambda\|\v_i\|^2=N\lambda_i.$$ So the squared length of $\X^\top \v_i$ is equal to $N\lambda_i$. Therefore, if we want to transform $\v_i$ into a unit-length covariance matrix eigenvector $\u_i$, we need to normalize it have unit length: $$\u_i = \frac{1}{(N\lambda_i)^{1/2}}\X^\top \v_i.$$

(Please note that the above was not using $\v_i=\X\u_i$ definition that you quoted. Instead, we started directly with a unit-length $\v_i$. I believe this might have been the source of your confusion. Bishop uses $\v_i=\X\u_i$ definition earlier in the section, but it is not relevant anymore for this particular argument.)

Relationship between eigenvectors of $\frac{1}{N}XX^\top$ and $\frac{1}{N}X^\top X$ in the context of PCA

1 Answers1

Linked