Is there any value in dimensionality reduction of a data set where all variables are approximately orthogonal?

Question

Suppose I have an $N$-dimensional data set where the $N$ dimensions are roughly orthogonal (have correlation zero).

Is there any utility in terms of:

Visualization
Representation (for classifier efficiency)
Or other criteria

to perform dimensionality reduction on the data?

Partly relevant answer http://stats.stackexchange.com/a/92803/3277 — ttnphns, Dec 24 '14 at 11:24

score 11 · Answer 1 · answered Dec 24 '14 at 18:02

You might try a more general non linear dimensionality reduction manifold learning method like locally linear embedding, laplacian eigenmaps or t-SNE.

It is perfectly possible for there to be a lower dimensional subspace (manifold) in your data in a way that leaves 0 correlation between the N-basis dimensions. For example a circle of points about the origin or wave form as seen here. PCA won't pick this up but other methods will.

Looking at such methods is especially interesting and common for visualization and exploratory data analysis. For use within a classifier or other model you'll need to restrict yourself to the methods that can be fit on training and applied on test which excludes lots of these methods. If this is your main interest you should also look into methods for unsupervised pretraining and (supervised) feature engineering.

score 8 · Accepted Answer · edited Apr 13 '17 at 12:44

I wanted to clarify a comment I left under @Peter-Flom's answer but it is probably worth writing in an answer. To what extent can you reduce dimensions by running PCA on nearly-orthogonal data? The answer is "it depends" on whether you perform the PCA on the correlation or covariance matrix.

If you are using PCA on the correlation matrix, then as this will only slightly differ from the identity matrix, there's a spherical symmetry which renders all directions "equally informative". Rescaling your variables' variances to one prior to PCA is a mathematically equivalent approach that will produce the same outcome. While the PCA output will identify some components with slightly lower variance than others, this can be attributed (if we assume zero correlation in the population) to nothing more than chance variation in the sample, so wouldn't be a good reason to jettison these components. In fact such disparity between standard deviations of components should reduce in magnitude as we increase sample size. We can confirm this in a simulation.

set.seed(123)
princompn <- function(n, sd1=1, sd2=1, sd3=1, sd4=1, cor=TRUE) {
    x1 <- rnorm(n, mean=0, sd=sd1)
    x2 <- rnorm(n, mean=0, sd=sd2)
    x3 <- rnorm(n, mean=0, sd=sd3)
    x4 <- rnorm(n, mean=0, sd=sd4)
    prcomp(cbind(x1,x2,x3,x4), scale.=cor)
}

Output:

> pc100 <- princompn(100)
> summary(pc100)
Importance of components:
                          PC1    PC2    PC3    PC4
Standard deviation     1.0736 1.0243 0.9762 0.9193
Proportion of Variance 0.2882 0.2623 0.2382 0.2113
Cumulative Proportion  0.2882 0.5505 0.7887 1.0000
> 
> pc1m <- princompn(1e6)
> summary(pc1m)
Importance of components:
                          PC1    PC2    PC3    PC4
Standard deviation     1.0008 1.0004 0.9998 0.9990
Proportion of Variance 0.2504 0.2502 0.2499 0.2495
Cumulative Proportion  0.2504 0.5006 0.7505 1.0000

However, if you do PCA using the covariance matrix instead of the correlation matrix (equivalently: if we don't scale the standard deviations to 1 before applying PCA), then the answer depends on the spread of your variables. If your variables have the same variance then we still have spherical symmetry, so there is no "privileged direction" and dimensional reduction can't be achieved.

> pcEqual <- princompn(n=1e6, sd1=4, sd2=4, sd3=4, sd4=4, cor=FALSE)
> summary(pcEqual)
Importance of components:
                          PC1    PC2    PC3    PC4
Standard deviation     4.0056 4.0010 3.9986 3.9936
Proportion of Variance 0.2507 0.2502 0.2499 0.2492
Cumulative Proportion  0.2507 0.5009 0.7508 1.0000

With a mixture of high and low variance variables, though, the symmetry is more like an ellipsoid with some wide axes and others thin. In this situation there will be high-variance components loading on the high-variance variables (where the ellipsoid is wide) and low-variance components loading on the low-variance variables (in which directions the ellipsoid is narrow).

> pcHiLo <- princompn(n=1e6, sd1=4, sd2=4, sd3=1, sd4=1, cor=FALSE)
> summary(pcHiLo)
Importance of components:
                          PC1    PC2    PC3     PC4
Standard deviation     4.0018 3.9985 1.0016 1.00005
Proportion of Variance 0.4709 0.4702 0.0295 0.02941
Cumulative Proportion  0.4709 0.9411 0.9706 1.00000
> round(pcHiLo$rotation, 3)
      PC1   PC2    PC3    PC4
x1  0.460 0.888  0.000  0.000
x2 -0.888 0.460  0.000  0.000
x3  0.000 0.000 -0.747 -0.664
x4  0.000 0.000  0.664 -0.747

If the variables have very different variances (geometrically an ellipsoid again but with all axes differing), then orthogonality allows the first PC to load very heavily on the highest-variance variable and so on.

> pc1234 <-  princompn(n=1e6, sd1=1, sd2=2, sd3=3, sd4=4, cor=FALSE)
> summary(pc1234)
Importance of components:
                          PC1    PC2    PC3     PC4
Standard deviation     3.9981 3.0031 1.9993 1.00033
Proportion of Variance 0.5328 0.3006 0.1332 0.03335
Cumulative Proportion  0.5328 0.8334 0.9667 1.00000
> round(pc1234$rotation, 3)
     PC1    PC2    PC3   PC4
x1 0.000  0.000 -0.001 1.000
x2 0.001 -0.001  1.000 0.001
x3 0.003 -1.000 -0.001 0.000
x4 1.000  0.003 -0.001 0.000

In the last two cases there were low variance components you might consider throwing away to achieve dimensional reduction, but doing so is exactly equivalent to throwing away the lowest variance variables in the first place. Essentially, orthogonality allows you to identify low-variance components with low-variance variables, so if you intend to reduce dimensionality in this manner, it isn't clear you would benefit from using PCA to do so.

Nota bene: the length of time spent discussing the case where the variables are not rescaled to unit variance - i.e. using the covariance rather than correlation matrix - should not be taken as an indication that this approach is somehow more important, and certainly not that it is "better". The symmetry of the situation is simply more subtle so required longer discussion.

My answer perhaps best answers the problem as posed by the original poster (which was about what PCA can or can't achieve), which I presume explains the green tick! But I urge readers to look at @RyanBressler's answer, which provides alternative *solutions*. — Silverfish, Jan 02 '15 at 15:58

Peter Flom · Answer 3 · 2014-12-24T11:58:09.750

6

If all N variables are roughly orthogonal then the dimension reduction will do relatively little reducing. E.g. in R

set.seed(123)
x1 <- rnorm(100)
x2 <- rnorm(100)
x3 <- rnorm(100)
x4 <- rnorm(100)
x5 <- rnorm(100)
x6 <- rnorm(100)
x7 <- rnorm(100)
x8 <- rnorm(100)
x9 <- rnorm(100)
x10 <- rnorm(100)

df1 <- cbind(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10)

pcsol <- princomp(df1)
loadings(pcsol)

In essence, "orthogonal" implies "already at its smallest size".

edited Dec 24 '14 at 11:58

answered Dec 24 '14 at 10:26

Peter Flom

94,055
35
143
276

2

you have a little typo in "orthongonal" – Silverfish Dec 24 '14 at 11:30
6

@Silverfish, [orthon](http://en.wikipedia.org/wiki/George_Adamski#Ufology) is "a medium-height humanoid with long blond hair and tanned skin wearing reddish-brown shoes". Though you may be correct in that Peter meant different. – ttnphns Dec 24 '14 at 11:45
4

Is it worth pointing out this answer depends on how the variables are scaled? If there are high and low variance variables and we do PCA on the covariance not correlation matrix, then there *will* be low-variance components that can be dropped (orthogonality just means their loadings heavily identify them with one low-variance variable each). Make the following slight changes: `x1 – Silverfish Dec 24 '14 at 11:46
I fixed the typo, thanks. Also, good point about the scaling. – Peter Flom Dec 24 '14 at 11:58
2

Doesn't this only apply to dimension reduction that relies on covariance? Why wouldn't, say, multidimensional scaling do any reducing here? – shadowtalker Dec 24 '14 at 15:03
Why would MDS work? – Peter Flom Dec 24 '14 at 15:15

Is there any value in dimensionality reduction of a data set where all variables are approximately orthogonal?

3 Answers3