I wanted to clarify a comment I left under @Peter-Flom's answer but it is probably worth writing in an answer. To what extent can you reduce dimensions by running PCA on nearly-orthogonal data? The answer is "it depends" on whether you perform the PCA on the correlation or covariance matrix.
If you are using PCA on the correlation matrix, then as this will only slightly differ from the identity matrix, there's a spherical symmetry which renders all directions "equally informative". Rescaling your variables' variances to one prior to PCA is a mathematically equivalent approach that will produce the same outcome. While the PCA output will identify some components with slightly lower variance than others, this can be attributed (if we assume zero correlation in the population) to nothing more than chance variation in the sample, so wouldn't be a good reason to jettison these components. In fact such disparity between standard deviations of components should reduce in magnitude as we increase sample size. We can confirm this in a simulation.
set.seed(123)
princompn <- function(n, sd1=1, sd2=1, sd3=1, sd4=1, cor=TRUE) {
x1 <- rnorm(n, mean=0, sd=sd1)
x2 <- rnorm(n, mean=0, sd=sd2)
x3 <- rnorm(n, mean=0, sd=sd3)
x4 <- rnorm(n, mean=0, sd=sd4)
prcomp(cbind(x1,x2,x3,x4), scale.=cor)
}
Output:
> pc100 <- princompn(100)
> summary(pc100)
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 1.0736 1.0243 0.9762 0.9193
Proportion of Variance 0.2882 0.2623 0.2382 0.2113
Cumulative Proportion 0.2882 0.5505 0.7887 1.0000
>
> pc1m <- princompn(1e6)
> summary(pc1m)
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 1.0008 1.0004 0.9998 0.9990
Proportion of Variance 0.2504 0.2502 0.2499 0.2495
Cumulative Proportion 0.2504 0.5006 0.7505 1.0000
However, if you do PCA using the covariance matrix instead of the correlation matrix (equivalently: if we don't scale the standard deviations to 1 before applying PCA), then the answer depends on the spread of your variables. If your variables have the same variance then we still have spherical symmetry, so there is no "privileged direction" and dimensional reduction can't be achieved.
> pcEqual <- princompn(n=1e6, sd1=4, sd2=4, sd3=4, sd4=4, cor=FALSE)
> summary(pcEqual)
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 4.0056 4.0010 3.9986 3.9936
Proportion of Variance 0.2507 0.2502 0.2499 0.2492
Cumulative Proportion 0.2507 0.5009 0.7508 1.0000
With a mixture of high and low variance variables, though, the symmetry is more like an ellipsoid with some wide axes and others thin. In this situation there will be high-variance components loading on the high-variance variables (where the ellipsoid is wide) and low-variance components loading on the low-variance variables (in which directions the ellipsoid is narrow).
> pcHiLo <- princompn(n=1e6, sd1=4, sd2=4, sd3=1, sd4=1, cor=FALSE)
> summary(pcHiLo)
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 4.0018 3.9985 1.0016 1.00005
Proportion of Variance 0.4709 0.4702 0.0295 0.02941
Cumulative Proportion 0.4709 0.9411 0.9706 1.00000
> round(pcHiLo$rotation, 3)
PC1 PC2 PC3 PC4
x1 0.460 0.888 0.000 0.000
x2 -0.888 0.460 0.000 0.000
x3 0.000 0.000 -0.747 -0.664
x4 0.000 0.000 0.664 -0.747
If the variables have very different variances (geometrically an ellipsoid again but with all axes differing), then orthogonality allows the first PC to load very heavily on the highest-variance variable and so on.
> pc1234 <- princompn(n=1e6, sd1=1, sd2=2, sd3=3, sd4=4, cor=FALSE)
> summary(pc1234)
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 3.9981 3.0031 1.9993 1.00033
Proportion of Variance 0.5328 0.3006 0.1332 0.03335
Cumulative Proportion 0.5328 0.8334 0.9667 1.00000
> round(pc1234$rotation, 3)
PC1 PC2 PC3 PC4
x1 0.000 0.000 -0.001 1.000
x2 0.001 -0.001 1.000 0.001
x3 0.003 -1.000 -0.001 0.000
x4 1.000 0.003 -0.001 0.000
In the last two cases there were low variance components you might consider throwing away to achieve dimensional reduction, but doing so is exactly equivalent to throwing away the lowest variance variables in the first place. Essentially, orthogonality allows you to identify low-variance components with low-variance variables, so if you intend to reduce dimensionality in this manner, it isn't clear you would benefit from using PCA to do so.
Nota bene: the length of time spent discussing the case where the variables are not rescaled to unit variance - i.e. using the covariance rather than correlation matrix - should not be taken as an indication that this approach is somehow more important, and certainly not that it is "better". The symmetry of the situation is simply more subtle so required longer discussion.