PCA on data having more dimensions than observations

Question

I have a 73 by 426 matrix, where 73 is the number of my observations, and 426 is the number of my features/dimensions. When I perform the PCA, I expect to get a 73 by 426 score matrix; instead, I get a 73 by 72 matrix. I assume this is because I have more dimensions than observations. Is there a way to overcome this problem?

Here is my code in Matlab:

[coeff,score,~,~,explained] = pca(class1Sgn);

Partial least squares is a kind of predictive PCA useful in your specific instance with more features than data. — Mike Hunter, Apr 06 '17 at 16:29
Is this a pca function you wrote? Some environment details would be helpful. — Matt L., Apr 06 '17 at 16:31
I think you will find the information you need in the linked thread. Please read it. If it isn't what you want / you still have a question afterwards, come back here & edit your question to state what you learned & what you still need to know. Then we can provide the information you need without just duplicating material elsewhere that already didn't help you. — gung - Reinstate Monica, Apr 06 '17 at 20:08
thank you for the answers, I have read the other threads and could see how to apply the given knowledge to my problem. It is not really always apparent that problems are actually similar. — xava, Apr 11 '17 at 12:56
As a follow up, I understand why the nth component with n observations will be trousblesome. But will earlier components also be affected? — Dave, May 07 '18 at 19:32

score 4 · Answer 1 · edited Apr 13 '17 at 12:44

Recall that principal components are, by construction, orthogonal. Your original data has a rank of 73 at most, so you cannot derive more than 73 principal components from it. In fact you will lose a degree of freedom yielding 72 PC's.

But what in the world do you plan on doing with 72 principal components?

I can't suggest whether to go this route without knowing your use case, but using a handful of principal components (the first 5-10 for instance) out of your 72 is done in some cases. Things can go wrong in PCA, even for the first few PC's, if your eigenvalues/scree plot do not show those first PC's having much larger eigenvalues than the others.

There is no way for PCA to give you a 73 by 436 score matrix. You could force factor analysis to do it, but I don't think you would yield anything useful. If going the PCA route, you could bootstrap several estimates to test the stability of your first few principal components.

https://arxiv.org/pdf/0911.3827.pdf

https://stats.stackexchange.com/a/45859/69090

score 3 · Accepted Answer · answered Apr 06 '17 at 16:58

Took some digging, but if you look at the MATLAB documentation it uses the SVD function to accomplish this task. If you look at the SVD documentation it states:

https://www.mathworks.com/help/matlab/ref/svd.html

[m-by-n matrix A]

m > n — Only the first n columns of U are computed, and S is n-by-n.

m = n — svd(A,'econ') is equivalent to svd(A).

m < n — Only the first m columns of V are computed, and S is m-by-m.

And then if you take what @user3348782 said you lose 1 degree of freedom when you do PCA and the result is a m-by-(m-1) matrix.

PCA on data having more dimensions than observations

2 Answers2