Normalization of eigenvector $\alpha$ of kernel PCA by eigenvalue $\lambda$

Question

Could someone please help me understand why when we project a new sample in kernel PCA to an eigenvector $\alpha$, we normalize it by dividing the eigenvector $\alpha$ with its eigenvalue $\lambda$?

I understand that we need to normalize the eigenvector $V$ and hence the eigenvector $\alpha$ following the criteria described in Eq 9 of the following original paper: Scholkopf et al. 1997, Kernel Principal Component Analysis.

However, I couldn't figure out why we divide by $\lambda$ in the normalization (instead of e.g. $(\|\boldsymbol\alpha^{k}\|\sqrt\lambda)$ - see below for derivation)?

Eq.9 requires

$$1 = \lambda_k(\boldsymbol\alpha^{k*}\cdot\boldsymbol\alpha^{k*})$$

where $\boldsymbol\alpha^{k*}$ is the normalized $\boldsymbol\alpha^k$. Since $\boldsymbol\alpha^k\cdot\boldsymbol\alpha^k = \|\boldsymbol\alpha^k\|^2$,

$$\boldsymbol\alpha^{k*} = \frac{\boldsymbol\alpha^{k}}{\|\boldsymbol\alpha^{k}\|\sqrt{\lambda_k}}$$

Eq. 10 (projection of a test point $\boldsymbol\phi(\mathbf{x})$) becomes

$$(\mathbf{V}^{k*}\cdot\boldsymbol\phi(\mathbf{x})) = \sum_{i=1}^{l}\alpha_i^{k*} (\boldsymbol\phi(\mathbf{x}_i)\cdot\boldsymbol\phi(\mathbf{x})) = \frac{1}{\|\boldsymbol\alpha^{k}\|\sqrt{\lambda_k}}\sum_{i=1}^{l} \alpha_i^{k}(\boldsymbol\phi(\mathbf{x}_i)\cdot\boldsymbol\phi(\mathbf{x}))$$ $$= \frac{1}{\sqrt{\lambda_k}}\sum_{i=1}^{l} \alpha_i^{k}(\boldsymbol\phi(\mathbf{x}_i)\cdot\boldsymbol\phi(\mathbf{x}))$$

where $\mathbf{V}^{k*}$ is the normalized $\mathbf{V}^{k}$ and $\|\boldsymbol\alpha^k\| = 1$. But, I know that the above normalization is incorrect because instead of dividing by $\sqrt{\lambda_k}$, we should simply divide by $\lambda_k$. So, I am not sure where I am making mistakes here.

For a nice python implementation of kernel PCA and the normalization of the eigenvector, please refer to the following site: http://sebastianraschka.com/Articles/2014_kernel_pca.html You can go straight to

def project_x(x_new, X, gamma, alphas, lambdas):

to see that we need to normalize the $\alpha$ by dividing it with $\lambda$ in order to get the correct projection.

Update:

The above normalization is correct. The question was there because I misunderstood Sebastian's code.

It seems to me that in order to get $\alpha^k \cdot \alpha^k = 1$ (normalization), you need to divide by $\lambda$. This is simply algebra as per Eqn 9. Or are you asking to clarify what's happening in Eqn 9? — ilanman, Dec 22 '16 at 23:15
@ilanman: Yes, I am asking why the normalization factor for $\alpha$ is $\lambda$. Doesn't Eq. 9 imply that $||\alpha|| = 1/\sqrt\lambda$? So, we should divide $\alpha$ by $\sqrt\lambda$ instead? But, $\sqrt\lambda$ is incorrect. If you go to the link below and search for "project_x(x_new, X, gamma, alphas, lambdas)", we need to divide $\alpha$ by $\lambda$ in order to get the right projection: http://sebastianraschka.com/Articles/2014_kernel_pca.html — Starz, Dec 24 '16 at 17:12
Yes, $||\alpha|| = \frac{1}{\sqrt{\lambda}}$. But since we want $\alpha^T\alpha=1$, we end up dividing by $\lambda$. — ilanman, Dec 24 '16 at 20:10
@ilanman: Hmmm.. i'm sorry.. i'm dumb :P.. still can't get it why when we calculate Eq 10, we divide by $\sqrt\lambda$ while there is only one $\alpha$ in the right hand side of the equation. — Starz, Dec 24 '16 at 20:33
How about in your question, you edit it and include the steps you would have thought are correct (i.e. using $\frac{1}{\sqrt{\lambda}}$) and maybe we can figure out what's missing. — ilanman, Dec 24 '16 at 22:01
@ilanman: I added the steps that I thought would be correct, but actually wrong. — Starz, Dec 25 '16 at 03:29
@amoeba: sorry for the confusion. I don't know how to normalize the eigenvector $\alpha$ to satisfy the criteria given by Eq. 9. Based on my incorrect derivation (I add some edits above), I have to divide by $(||\alpha^k||\sqrt\lambda)$. However, the correct normalization should be simply dividing by $\lambda$ as shown by Sebastian Raschka on his site (I also add this in the question). — Starz, Dec 25 '16 at 03:51
As to your main question, see my answer here http://stats.stackexchange.com/questions/126014. I think one should divide by $\sqrt{\lambda}$ and so you are right. No idea what goes on in the code on sebastianraschka.com and frankly do not feel very motivated to find out. It might very well be that something is wrong there. — amoeba, Dec 25 '16 at 10:00
Agreed. What you have makes sense to me. Not sure what the other guy does in his code. Can you find elsewhere an example of diving by $\lambda$ instead of $\sqrt{\lambda}$? — ilanman, Dec 25 '16 at 21:20
I think I can see what happens in that code. He computes `X_pc` without any normalization (whereas "correct" normalization involves a square root of lambda - see my linked answer). So then an "out of sample" projection he computes with 1/sqrt(lambda) instead of 1/lambda and it fits to the PCs because those were NOT multiplied by sqrt(lambda). — amoeba, Dec 25 '16 at 22:01
@amoeba: YES!!! You are correct! Could you help me how do you get the following? $$\mathbf X_\mathrm{projected\:on\: axis \:\#i} = \sqrt{\lambda_i} \mathbf E_i,$$ .. So far, I have misunderstood that we don't need the sqrt(lambda).. sorry, my brain is fried after going through all the math.. i think this should be pretty obvious from the derivation, but I couldn't think of it.. — Starz, Dec 25 '16 at 23:08
also, just for confirmation.. i confirmed with using kernel PCA from scikit-learn that the correct normalization is by dividing sqrt(lambda).. — Starz, Dec 25 '16 at 23:10
@amoeba: i am reading your derivation again, and will just post the question on your answer in the link: http://stats.stackexchange.com/questions/126014/how-to-project-a-new-vector-onto-the-pc-space-using-kernel-pca. Thank you for helping.. really appreciate it.. — Starz, Dec 26 '16 at 02:41
@amoeba: apparently, because my reputation is less than 50, i can't comment on your answer to ask question. So, i will ask 2 questions here. 1) In your derivation, isn't the covariance matrix $\mathbf X$ in linear PCA also a Gram matrix? 2) In kernel PCA, is the kernel matrix $\mathbf K = \mathbf X \mathbf X^\top$ where $\mathbf X$ is a covariance matrix as shown in your third equation? I could follow the rest of the derivation, but having trouble with relating the second and third equations to kernel PCA. — Starz, Dec 26 '16 at 05:04
I edited my linked answer to clarify things, see if it became clearer. (1) What I call "Gram matrix" for data matrix X is is XX'. Covariance matrix is X'X/n. See my edits. (2) X is a *data matrix* not a covariance matrix. Again, see my edits. — amoeba, Dec 26 '16 at 10:22
@amoeba: ahh very nice.. it's all clear.. i never realized the connection with the SVD of $\mathbf X$ even in PCA.. so far I always had in mind the SVD of covariance matrix.. thank you so much.. when i was doing some research last night, i found this paper.. you are probably already aware of this.. https://arxiv.org/pdf/1407.2904v1.pdf.. when you have time (no rush), would you mind helping me with question 2 of the following link? http://stats.stackexchange.com/questions/252616/linear-discriminant-analysis-2-class-3d-data-sets-and-biplot.. — Starz, Dec 26 '16 at 14:40
See my answer in http://stats.stackexchange.com/questions/134282 about SVD/PCA connection. I will take a look at your other question later. — amoeba, Dec 26 '16 at 17:01
Lam, there is currently 4 (out of 5 necessary) votes to close this question as a duplicate of the earlier linked thread with my answer. Are you fine with this closing? If so, you can approve it yourself. Or would you prefer this to stay open and get a separate answer? — amoeba, Dec 28 '16 at 14:06
Sure.. i am outside and not in front of computer now.. the approve button doesn't show on phone..please go ahead and close.. — Starz, Dec 28 '16 at 14:09
@amoeba: back on computer.. don't see any button to close.. btw, i am new to stackexchange.. just delete will do? — Starz, Dec 29 '16 at 23:45

Normalization of eigenvector $\alpha$ of kernel PCA by eigenvalue $\lambda$

0 Answers0

Linked