Intuition About Principal Component Directions

Question

I am trying to really get a deep understanding of PCA. From my understanding, a principal component is defined as $$\mathbf{z}_k = \phi_{1,k} \mathbf{x}_1 + \ldots + \phi_{p,k} \mathbf{x}_p = \mathbf{X} \boldsymbol{\phi}_k, \tag{1}$$ where $\boldsymbol{\phi}_k = (\phi_{1,k}, \ldots, \phi_{p,k})$ is a vector of scalars and $\mathbf{x}_j$ is the $j^{\text{th}}$ predictor. In other words, a principal component is a linear combination of the original predictors. The loading vectors $\boldsymbol{\phi}_k$ are chosen to maximize the varaince of the principal components, i.e. maximize $\mathrm{Var}(\mathbf{X} \boldsymbol{\phi}_k)$, and as a result, each loading vector is orthogonal, i.e. $\langle \boldsymbol{\phi}_k , \boldsymbol{\phi}_{\ell} \rangle = 0$ unless $k = \ell$. Also, during the optimization process, we constrain each loading vector to be of unit length, so $\| \boldsymbol{\phi}_k \|_2 = 1$ for all $k$.

If we want to write this more compactly, if the columns of a matrix $\mathbf{Z}$ are the principal components and the columns of $\mathbf{\Phi}$ are the loading vectors, we have $$\mathbf{Z} = \mathbf{X} \mathbf{\Phi}. \tag{2}$$ As a result of the two conditions above, the matrix $\mathbf{\Phi}$ is orthogonal, meaning $\mathbf{\Phi}^{-1} = \mathbf{\Phi}^T$. So multiplying both sides of $(2)$ by $\mathbf{\Phi}^T$ gives us $$ \mathbf{X} = \mathbf{Z} \mathbf{\Phi}^T \tag{3}.$$ It is worth noting that in practice, $(3)$ is calculated using the singular value decomposition $\mathbf{X} = \mathbf{U} \mathbf{D} \mathbf{V}^T$, where $\mathbf{Z} = \mathbf{U} \mathbf{D}$ and $\mathbf{\Phi} = \mathbf{V}$.

Re-writing the two matrices on the right side of $(3)$ as $\mathbf{Z} = (\mathbf{z}_1, \ldots, \mathbf{z}_p)$ and $\mathbf{\Phi}^T = (\boldsymbol{\phi}_1^T, \ldots, \boldsymbol{\phi}_p^T)^T$, we get $$ \begin{align} \mathbf{X} &= \begin{pmatrix} \mathbf{z}_1 & \cdots & \mathbf{z}_p \end{pmatrix} \begin{pmatrix} \boldsymbol{\phi}_1^T \\ \vdots \\ \boldsymbol{\phi}_p^T \end{pmatrix} \\ &= \mathbf{z}_1 \boldsymbol{\phi}_1^T + \ldots + \mathbf{z}_p \boldsymbol{\phi}_p^T \\ &= (\mathbf{X} \boldsymbol{\phi}_1) \boldsymbol{\phi}_1^T + \ldots + (\mathbf{X} \boldsymbol{\phi}_p) \boldsymbol{\phi}_p^T\tag{4} \\ &= \mathbf{X} \Big( \boldsymbol{\phi}_1 \boldsymbol{\phi}_1^T + \ldots + \boldsymbol{\phi}_p \boldsymbol{\phi}_p^T \Big). \end{align}$$ From this, it has to be true that $\Big( \boldsymbol{\phi}_1 \boldsymbol{\phi}_1^T + \ldots + \boldsymbol{\phi}_p \boldsymbol{\phi}_p^T \Big) = \mathbf{I}$. Here is some more empirecal evidence to show that this is true, using the simple case of two predictors:

set.seed(100)
x1 = rnorm(2000); x2 = x1 + 0.5*rnorm(2000)
mat = matrix(c(x1, x2), ncol = 2)
matsvd = svd(mat)
D = diag(2); diag(D) = matsvd$d
score = matsvd$u %*% D
load = matsvd$v

load[,1] %*% t(load[,1]) + load[,2] %*% t(load[,2])
     [,1] [,2]
[1,]    1    0
[2,]    0    1

My problem is that I cannot come up with an intuitive reason as to why this is true, and I was wondering if anyone could provide one. Is there any significant meaning behind $\boldsymbol{\phi}_k \boldsymbol{\phi}_k^T$? (As answered by @gunes below, since $\mathbf{\Phi}$ is orthogonal and square, we have $\mathbf{\Phi} \mathbf{\Phi}^T = \mathbf{I}$).

EDIT

I would also like to know if my definitions are correct. I stated that $\boldsymbol{\phi}_k$ is the loading vector for the $k^{\text{th}}$ principal component, and so the matrix $\mathbf{\Phi}$ would be the loading matrix. I am getting this definition from section 10.2.1 of An Introduction to Statistical Learning. However, I have also seen (for example, here), loading vector defined as $\boldsymbol{\phi}_k = d_k \boldsymbol{v}_k$, i.e. the $k^{\text{th}}$ loading vector is the $k^{\text{th}}$ right singular vector scaled up by the $k^{\text{th}}$ singular value. So which definition is correct?

To your last section about definitions. Some people, texts and programs call "loadings" the eigenvector entries and some call it these entries scaled up by the corresponding eigen- (or singular) values. The second way is better for a number of reasons, including linquistic, and I would strongly [recommend](https://stats.stackexchange.com/a/35653/3277) following it. (cont.) — ttnphns, Jul 18 '19 at 08:34
(cont.) And even in the current Wikipedia article on PCA, if you read it through, you'll find that one paragraph implies word "loadings" is applicable to both unit-scaled direction vectors and them eigenvector-scaled, and another section later defines "loadings" as the label for the second only. — ttnphns, Jul 18 '19 at 08:35
Ah okay, so it would be best to have $\mathbf{\Phi}$ be the *principal directions*, and the loadings would be given by $\mathbf{D} \mathbf{\Phi}$. The notational inconsistency is annoying, as it makes learning about something new more difficult than it needs to be! — akenny430, Jul 19 '19 at 01:09

score 1 · Answer 1 · answered Jul 18 '19 at 07:23

We know (and you also stated) that $\mathbf{\Phi}$ is an orthogonal matrix, i.e. $\mathbf{\Phi}\mathbf{\Phi}^T=\mathbf{I}$. If we open the LHS, we'll have $$\mathbf{\Phi}\mathbf{\Phi}^T=[\mathbf{\phi}_1 \ ...\ \mathbf{\phi}_p]\left[\begin{matrix}\phi_1^T \\ ...\\ \phi_p^T\end{matrix}\right]=\sum_{i=1}^p\phi_i\phi_i^T=\mathbf{I}$$

Intuition About Principal Component Directions

1 Answers1