PCA: inference on the proportion of explained variance, in a large p setting

Question

I am interested in doing inference on the proportion of total variance explained by the first principal component, for a PCA based on the correlation matrix R. I want to know the (asymptotic) distribution of

$$\lambda^R_1/\sum_i\lambda^R_i=\lambda^R_1/tr(R)=\lambda^R_1/p$$

where $\lambda^R_i$ is the ith eigenvalue of R, the sample correlation matrix, and $p$ is the number of variables.

What is the distribution of this statistic, are there methods available to form confidence intervals? I found surprisingly little references for this. I am particularly interested in a large dimensional setup, where $p\to\infty$, $N\to\infty$ but $p/N\to c$, not necessarily the classical case where p is fixed and $p/N\to 0$. From random matrix theory and Marcenko Pastur, we know that the first eigenvalue will be biased upwards, but I am still unclear how this is going to affect the distribution of $\lambda^R_1/p$ as $p\to\infty$.

Thanks!

What are you assuming about the distribution of $R$ (or of the underlying distribution of the sample on which it is based)? — whuber, Oct 10 '20 at 19:40
At this point, I am fine making assumptions such as R comes from an i.i.d sample of normal multivariate, but if there are more general results that's even better. — Matifou, Oct 10 '20 at 19:56
Tbh, this sounds quite hopeless. The covariance matrix itself has a complicated distribution (Wishart in normal multivariate case, in more complicated cases likely intractable). Then on top of that you apply another highly complicated non-linear function, namely, the extraction of the first eigenvalue. I would start by trying to at least find the mean and variance of that distribution, even that may prove too hard. — Aleksejs Fomins, Oct 12 '20 at 13:03
I wonder if bootstrapping would work here? I think you could check this easily enough through simulation. — Eoin, Oct 12 '20 at 22:40
Thanks @AleksejsFomins, you can be more optimistic about research, the distribution of the first eigenvalue has been derived already in 2001, even in the random matrix context (Johnstone). But haven't found yet the result for the scaled eigenvalue. And thanks Eoin too, but here bootrstrapping can only be done along the T dimension (as I assume iid), but that's precisely the low dimension in my context :-( — Matifou, Oct 12 '20 at 23:32
@Matifou Could you perhaps attach the results you already know of as links to your original question? Would be very interesting to read. Also, I'm a bit confused. Haven't you just shown that the scaled eigenvalue is divided by the number of variables $p$. I thought that $p$ is known a priori, which makes it a constant. So the scaled eigenvalue should follow the same distribution. What am I missing? — Aleksejs Fomins, Oct 13 '20 at 07:42

PCA: inference on the proportion of explained variance, in a large p setting

0 Answers0