9

Assuming I have a data set with $d$ dimensions (e.g. $d=20$) so that each dimension is i.i.d. $X_i \sim U[0;1]$ (alternatively, each dimension $X_i \sim \mathcal N[0;1]$) and independent of each other.

Now I draw a random object from this dataset and take the $k=3\cdot d$ nearest neighbors and compute PCA on this set. In contrast to what one might expect, the eigenvalues aren't all the same. In 20 dimensions uniform, a typical result looks like this:

0.11952316626613427, 0.1151758808663646, 0.11170020254046743, 0.1019390988585198,
0.0924502502204256, 0.08716272453538032, 0.0782945015348525, 0.06965903935713605, 
0.06346159593226684, 0.054527131148532824, 0.05346303562884964, 0.04348400728546128, 
0.042304834600062985, 0.03229641081461124, 0.031532033468325706, 0.0266801529298156, 
0.020332085835946957, 0.01825531821510237, 0.01483790669963606, 0.0068195084468626625

For normal distributed data, the results appear to be very similar, at least when rescaling them to a total sum of $1$ (the $\mathcal N[0;1]^d$ distribution clearly has a higher variance in the first place).

I wonder if there is any result that predicts this behavior? I'm looking for a test if the series of eigenvalues is somewhat regular, and how many of the eigenvalues are as expected and which ones significantly differ from the expected values.

For a given (small) sample size $k$, is there a result if a correlation coefficient for two variables is significant? Even i.i.d. variables will have a non-0 result occasionally for low $k$.

Macro
  • 40,561
  • 8
  • 143
  • 148
Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96

1 Answers1

7

There is a large literature on the distribution of eigenvalues for random matrices (you can try googling random matrix theory). In particular, the Marcenko-Pastur distribution predicts the distribution of eigenvalues for the covariance matrix of $i.i.d.$ data with mean of zero and equal variance as the number of variables and observations goes to infinity. Closely related is Wigner's semicircle distribution.

John
  • 2,117
  • 16
  • 24