9

In Christopher Bishop's book Pattern Recognition and Machine Learning, the section on PCA contains the following:

Given a centred data matrix $\mathbf{X}$ with covariance matrix $N^{-1}\mathbf{X}^T\mathbf{X}$, the eigenvector equation is:

$$N^{-1}\mathbf{X}^T\mathbf{X} \mathbf{u}_i = \lambda_i \mathbf{u}_i.$$

Defining $\mathbf{v}_i = \mathbf{X} \mathbf{u}_i$, Bishop claims that if $\mathbf{u}_i$ and $\mathbf{v}_i$ have unit length, then:

$$\mathbf{u}_i = \frac{1}{(N\lambda_i)^{\frac{1}{2}}}\mathbf{X}^T\mathbf{v}_i.$$

Where does the square root come from?


EDIT:

In particular, why is the following invalid:

$\frac{1}{N}\mathbf{X}^T\mathbf{X}\mathbf{u}_i = \lambda \mathbf{u}_i$

$\Rightarrow \frac{1}{N}\mathbf{X}^T \mathbf{v}_i = \lambda \mathbf{u}_i$ using $\mathbf{v}_i = \mathbf{Xu}_i$

$\Rightarrow \frac{1}{N\lambda_i}\mathbf{X}^T \mathbf{v}_i = \mathbf{u}_i$

The same result, but without the square root.

amoeba
  • 93,463
  • 28
  • 275
  • 317
Danny
  • 93
  • 6
  • 1
    The square root comes from the requirement of PCA that the covariance matrix of the transformed data be the identity matrix. Without it you would be doing $VV^T x = Ix = x$ and you wouldn't have gotten anywhere. – conjectures Jan 25 '16 at 14:56
  • @amoeba I have expanded my question to contain the exact problem I am having, I will also include more context if necessary. – Danny Jan 25 '16 at 16:34
  • "...using $v_i=Xu_i$" is not justified by the defining equation $v_iX=u_i$ and is usually not true. – whuber Jan 25 '16 at 17:48
  • @whuber Whoops. that was a typo, the book does define $v_i = Xu_i$. Thanks – Danny Jan 25 '16 at 18:07

1 Answers1

19

This refers to the short section 12.1.4 PCA for high-dimensional data in Bishop's book. I can see that this section can be a bit confusing, because Bishop is going back and forth between $\newcommand{\X}{\mathbf X}\newcommand{\v}{\mathbf v}\newcommand{\u}{\mathbf u}\v_i$ and $\u_i$ using a slightly inconsistent notation.

The section is about the relationship between the eigenvectors of covariance matrix $\frac{1}{N}\X^\top \X$ and the eigenvectors of the Gram matrix $\frac{1}{N}\X \X^\top$ (in the context of PCA). Let $\v_i$ be a unit-length eigenvector of $\frac{1}{N}\X \X^\top$:

$$\frac{1}{N}\X \X^\top \v_i = \lambda_i \v_i.$$

If we multiply this equation by $\X^\top$ from the left:

$$\frac{1}{N}\X^\top \X (\X^\top \v_i) = \lambda_i (\X^\top \v_i),$$

we see that $\X^\top \v_i$ is an eigenvector of $\frac{1}{N}\X^\top \X$.

However, it will not have unit length! Indeed, let us compute its length: $$\|\X^\top \v_i\|^2=(\X^\top \v_i)^\top \X^\top \v_i = \v_i^\top \X\X^\top \v_i=\v_i(N\lambda v_i)=N\lambda\|\v_i\|^2=N\lambda_i.$$ So the squared length of $\X^\top \v_i$ is equal to $N\lambda_i$. Therefore, if we want to transform $\v_i$ into a unit-length covariance matrix eigenvector $\u_i$, we need to normalize it have unit length: $$\u_i = \frac{1}{(N\lambda_i)^{1/2}}\X^\top \v_i.$$

(Please note that the above was not using $\v_i=\X\u_i$ definition that you quoted. Instead, we started directly with a unit-length $\v_i$. I believe this might have been the source of your confusion. Bishop uses $\v_i=\X\u_i$ definition earlier in the section, but it is not relevant anymore for this particular argument.)

amoeba
  • 93,463
  • 28
  • 275
  • 317