How to whiten the data using principal component analysis?

Question

I want to transform my data $\mathbf X$ such that the variances will be one and the covariances will be zero (i.e I want to whiten the data). Furthermore the means should be zero.

I know I will get there by doing Z-standardization and PCA-transformation, but in which order should I do them?

I should add that the composed whitening transformation should have the form $\mathbf{x} \mapsto W\mathbf{x} + \mathbf{b}$.

Is there a method similar to PCA which does exactly both these transformations and gives me a formula of the form above?

(My first comment was based on misreading your question.) PCA gives you zero covariances; you can standardize the PCs afterwards if you wish. It sounds an odd thing to do, but you can do it. — Nick Cox, Apr 30 '14 at 11:59
@NickCox Maybe it seems odd because the transformed data is then spherical, which seems uninformative. However, it is the transformation I need to know, and not the end result. Still I don't know what the transformation would look like. I'm still reading up on PCA, though. — Angelorf, Apr 30 '14 at 12:24

score 39 · Accepted Answer · edited Apr 13 '17 at 12:44

First, you get the mean zero by subtracting the mean $\boldsymbol \mu = \frac{1}{N}\sum \mathbf{x}$.

Second, you get the covariances zero by doing PCA. If $\boldsymbol \Sigma$ is the covariance matrix of your data, then PCA amounts to performing an eigendecomposition $\boldsymbol \Sigma = \mathbf{U} \boldsymbol \Lambda \mathbf{U}^\top$, where $\mathbf{U}$ is an orthogonal rotation matrix composed of eigenvectors of $\boldsymbol \Sigma$, and $\boldsymbol \Lambda$ is a diagonal matrix with eigenvalues on the diagonal. Matrix $\mathbf{U}^\top$ gives a rotation needed to de-correlate the data (i.e. maps the original features to principal components).

Third, after the rotation each component will have variance given by a corresponding eigenvalue. So to make variances equal to $1$, you need to divide by the square root of $\boldsymbol \Lambda$.

All together, the whitening transformation is $\mathbf{x} \mapsto \boldsymbol \Lambda^{-1/2} \mathbf{U}^\top (\mathbf{x} - \boldsymbol \mu)$. You can open the brackets to get the form you are looking for.

Update. See also this later thread for more details: What is the difference between ZCA whitening and PCA whitening?

I think you need to divide by the square roots of the eigenvalues, as it is a matter of scaling by SD, not variance. — Nick Cox, Apr 30 '14 at 13:52
@NickCox: yes, of course you are right. I corrected my answer. Thank you! — amoeba, Apr 30 '14 at 14:07
I have empirically verified the formula. Thanks for helping me! — Angelorf, Apr 30 '14 at 14:38

How to whiten the data using principal component analysis?

1 Answers1

Linked