2

Admittedly I am now questioning my understanding of the output from PCA in sklearn.

At a high level, many tutorials discuss the benefits of PCA as having uncorrelated components for use in downstream tasks. However, PCA in sklearn has a parameter whiten which is False by default.

My question: Are the components returned by the default behavior of PCA in sklearn correlated, and only uncorrelated if we use the whiten=True parameter? And to clarify, I am referring to the components that are returned with .transform or .fit_transform.

If the default behavior returns orthogonal components, my follow on here is to understand the behavior of whiten=True at a high level.

Btibert3
  • 1,154
  • 1
  • 13
  • 23

1 Answers1

2

By construction, "default" PCA (in scikit-learn or otherwise) always returns uncorrelated components.

Whitening also ensures that the different components of PCA have unit variance; this can be useful to improve the predictive accuracy of some algorithms.

To summarise:

whitening = decorrelation (e.g. PCA) + normalisation

But PCA can be performed on its own and does produce uncorrelated components.

What can be confusing is scikit-kearn's whiten description:

When True (False by default) the components_ vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.

The word uncorrelated should be removed for clarity (as the decorrelation comes from PCA, not from setting whiten to True).

David M.
  • 268
  • 1
  • 7