3

I have a high-dimensional data matrix X where sample size is smaller than the variable size. I want to use PCA as a dimensionality reduction method, but I cannot call it directly in R since the matrix X is rank-deficient. I saw below technique in an R code to get the principal component representation from a rank-deficient matrix:

1) Get U from svd(XXT)

2) Get the principal component representation C by solving X = UC

I am new to PCA, and I have no idea how that makes sense. Could someone clarify how those 2 steps is equivalent to applying PCA on X?

amoeba
  • 93,463
  • 28
  • 275
  • 317
user5054
  • 1,259
  • 3
  • 13
  • 31
  • 1
    Probable duplicate, at least closely related: http://stats.stackexchange.com/q/147880/3277. – ttnphns May 24 '15 at 04:51
  • 1
    There are many questions asked about the relation between PCA and SVD, but my specific question is that I would expect the 1st step to be `svd(X)` rather than `svd(XX^T)`. – user5054 May 24 '15 at 04:56
  • 2
    Note that since `XX'` (as well as `X'X`) is square symmetric its svd() = its eigendecomposition(). – ttnphns May 24 '15 at 04:59
  • Principal component analysis (PCA) is usually explained via "an eigen-decomposition of the covariance matrix (`XX^T`)" or via "a singular value decomposition (SVD) of the data matrix itself (`X`)". That's what confuses me. Is it okay to use either `svd(X)` or `svd(XX^T)` in the 1st step? – user5054 May 24 '15 at 05:12
  • Yes. (http://stats.stackexchange.com/q/79043/3277), you may use both. – ttnphns May 24 '15 at 05:22
  • Thanks, this helps a lot! Also, is there a problem with the 2nd part? It calculates the principal components using `U^T * X`, but what is explained in the link you sent is to calculate principal components using `U * S`, where S is the matrix of the singular values. – user5054 May 24 '15 at 05:56
  • 1
    What do you mean you cannot call it directly? Just use `prcomp` and not `princomp`. – amoeba May 24 '15 at 06:59

1 Answers1

3

PCA amounts to finding and interpreting a singular value decomposition (SVD) of $X$ (or any closely-related matrix obtained by various forms of "standardizing" the rows and/or columns of $X$). There are direct connections between any SVD of $X$ and any SVD of $XX^\prime$. I will exhibit the connections in both directions, by obtaining an SVD of one from an SVD of the other.


1. Any SVD of $X$ determines a unique SVD of $XX^\prime$.

By definition, an SVD of $X$ is a representation in the form

$$X = U\Sigma V^\prime$$

where $\Sigma$ is diagonal with non-negative entries and $U$ and $V$ are orthogonal matrices. Among other things, this implies $V^\prime V = \mathbb{I}$, the identity matrix.

Compute

$$XX^\prime = (U\Sigma V^\prime)(U\Sigma V^\prime)^\prime= (U\Sigma V^\prime)(V\Sigma U^\prime) = U\Sigma V^\prime V \Sigma U^\prime = U\Sigma \mathbb{I} \Sigma U^\prime = U (\Sigma^2) U^\prime.$$

Since $\Sigma^2$ is diagonal with non-negative entries and $U$ is orthogonal, this is an SVD of $XX^\prime$.


2. Any SVD of $XX^\prime$ gives (at least one) SVD of $X$.

Conversely, because $XX^\prime$ is symmetric and positive-definite, by means of an SVD or with the Spectral Theorem it can be diagonalized via an orthogonal transformation $U$ (for which both $UU^\prime$ and $U^\prime U$ are identity matrices):

$$XX^\prime = U\Lambda^2 U^\prime.$$

In this decomposition, $\Lambda$ is an $n\times n$ diagonal matrix ($n$ is the number of rows of $X$) with $r = \text{rank}(X)$ non-zero entries which--without any loss of generality--we may assume are the first $r$ entries $\Lambda_{ii}, i=1,2,\ldots, r$. Define $\Lambda^{-}$ to be the diagonal matrix with entries $1/\Lambda_{ii}, i=1,2,\ldots, r$ (and zeros otherwise). It is a generalized inverse of $\Lambda$. Let

$$Y = \Lambda^{-}U^\prime X$$

and compute

$$YY^\prime = \Lambda^{-} U^\prime XX^\prime U \Lambda^{-} = \Lambda^{-} \Lambda^2 \Lambda^{-} = \mathbb{I}_{r;n}.$$

(I have employed the notation $\mathbb{I}_{r;n}$ for an identity-like $n\times n$ matrix of rank $r$: it has $r$ ones along the diagonal and zeros everywhere else.) This result implies $Y$ must be a block matrix of the form

$$Y = \pmatrix{W^\prime & 0 \\ 0 & 0}$$

with $W$ an $r\times r$ orthogonal matrix. The zeros are matrices of appropriate dimensions to fill out the rest of $Y$ (which has the same dimensions as $X$). Let $p$ be the number of columns of $X$. Certainly $r \le p$. If $r \lt p$, we may extend $W$ to a $p\times p$ orthogonal matrix $V$. This can be done in many ways, but a simple one is to put $W^\prime$ in the upper left block, an identity matrix of dimensions $p-r\times p-r$ in the lower right block, and zeros everywhere else.

Since (by construction) $V^\prime V = \mathbb{I}_p$,

$$U^\prime X V = \Lambda Y V = \Lambda (V^\prime V) = \Lambda.$$

Upon left-multiplying by $U$ and right-multiplying by $V^\prime$ we obtain

$$U\Lambda V^\prime = (UU^\prime) X (VV^\prime) = X,$$

which is an SVD of $X$.

$$XX^\prime = (U\Sigma V^\prime)(U\Sigma V^\prime)^\prime= (U\Sigma V^\prime)(V\Sigma U^\prime) = U\Sigma V^\prime V \Sigma U^\prime = U\Sigma \mathbb{I} \Sigma U^\prime = U (\Sigma^2) U^\prime.$$

Since $\Sigma^2$ is diagonal with non-negative entries and $U$ is orthogonal, this is an SVD of $XX^\prime$.

whuber
  • 281,159
  • 54
  • 637
  • 1,101