Understanding PCA done on permuted data

Question

Apologies if this has been asked before, nothing turned up when I tried to search.

I'm noticing some very interesting behavior when I try to do PCA on pairs of some dummy datasets I just invented, which are permutations of a fixed set (here just the range from 1 to 10.) In R:

x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(10, 2, 1, 5, 4, 3, 9, 8, 7, 6)
z <- c(8, 3, 2, 1, 4, 7, 9, 6, 5, 10)

df1 <- data.frame(x, y)
df2 <- data.frame(x, z)
df3 <- data.frame(y, z)

I then use prcomp:

> prcomp(df1)
Standard deviations (1, .., p=2):
[1] 3.415650 2.581989

Rotation (n x k) = (2 x 2):
        PC1        PC2
x 0.7071068 -0.7071068
y 0.7071068  0.7071068
> prcomp(df2)
Standard deviations (1, .., p=2):
[1] 3.681787 2.185813

Rotation (n x k) = (2 x 2):
        PC1        PC2
x 0.7071068 -0.7071068
z 0.7071068  0.7071068
> prcomp(df3)
Standard deviations (1, .., p=2):
[1] 3.858612 1.855921

Rotation (n x k) = (2 x 2):
         PC1        PC2
y -0.7071068 -0.7071068
z -0.7071068  0.7071068

So, each component of each principal component is either $\frac{\sqrt{2}}{2}$ or $-\frac{\sqrt{2}}{2}$. I'm not sure exactly why this would be, although it makes a certain kind of sense: both variables 'contain the same data' in a sense, and if we didn't see this behavior, we would be 'preferring' one variable over the other.

That's a very high-level and handwavey view of things, though. Also, if I try more than two variables at a time, this behavior disappears:

>prcomp(data.frame(x, y, z))
Standard deviations (1, .., p=3):
[1] 4.208109 2.603247 1.736355

Rotation (n x k) = (3 x 3):
        PC1        PC2        PC3
x 0.5003708  0.8107791 -0.3037538
y 0.5781928 -0.5740476 -0.5797951
z 0.6444549 -0.1144843  0.7560233

Can someone give me some insight into what's going on here?

This has little to do with permutations and everything to do with the fact that (1) your data are standardized to a common variance and (2) you are performing PCA in just two dimensions. See the extended explanation in my answer at https://stats.stackexchange.com/a/71303/919, especially the matrix $\mathbb Q$ at the very end. For comparison, generate perfectly random data for `x` and `y` and run `prcomp(df1, scale=TRUE)`: you will get the same result. — whuber, Sep 17 '19 at 12:12

Understanding PCA done on permuted data

0 Answers0