3

I have a 73 by 426 matrix, where 73 is the number of my observations, and 426 is the number of my features/dimensions. When I perform the PCA, I expect to get a 73 by 426 score matrix; instead, I get a 73 by 72 matrix. I assume this is because I have more dimensions than observations. Is there a way to overcome this problem?

Here is my code in Matlab:

[coeff,score,~,~,explained] = pca(class1Sgn);
amoeba
  • 93,463
  • 28
  • 275
  • 317
xava
  • 143
  • 1
  • 7
  • Partial least squares is a kind of predictive PCA useful in your specific instance with more features than data. – Mike Hunter Apr 06 '17 at 16:29
  • Is this a pca function you wrote? Some environment details would be helpful. – Matt L. Apr 06 '17 at 16:31
  • no this is the function provided by MATLAB – xava Apr 06 '17 at 16:47
  • 1
    why do you expect 73 by 426 score matrix? – Aksakal Apr 06 '17 at 17:15
  • 2
    I think you will find the information you need in the linked thread. Please read it. If it isn't what you want / you still have a question afterwards, come back here & edit your question to state what you learned & what you still need to know. Then we can provide the information you need without just duplicating material elsewhere that already didn't help you. – gung - Reinstate Monica Apr 06 '17 at 20:08
  • thank you for the answers, I have read the other threads and could see how to apply the given knowledge to my problem. It is not really always apparent that problems are actually similar. – xava Apr 11 '17 at 12:56
  • As a follow up, I understand why the nth component with n observations will be trousblesome. But will earlier components also be affected? – Dave May 07 '18 at 19:32

2 Answers2

4

Recall that principal components are, by construction, orthogonal. Your original data has a rank of 73 at most, so you cannot derive more than 73 principal components from it. In fact you will lose a degree of freedom yielding 72 PC's.

But what in the world do you plan on doing with 72 principal components?

I can't suggest whether to go this route without knowing your use case, but using a handful of principal components (the first 5-10 for instance) out of your 72 is done in some cases. Things can go wrong in PCA, even for the first few PC's, if your eigenvalues/scree plot do not show those first PC's having much larger eigenvalues than the others.

There is no way for PCA to give you a 73 by 436 score matrix. You could force factor analysis to do it, but I don't think you would yield anything useful. If going the PCA route, you could bootstrap several estimates to test the stability of your first few principal components.

https://arxiv.org/pdf/0911.3827.pdf

https://stats.stackexchange.com/a/45859/69090

3

Took some digging, but if you look at the MATLAB documentation it uses the SVD function to accomplish this task. If you look at the SVD documentation it states:

https://www.mathworks.com/help/matlab/ref/svd.html

[m-by-n matrix A]

m > n — Only the first n columns of U are computed, and S is n-by-n.

m = n — svd(A,'econ') is equivalent to svd(A).

m < n — Only the first m columns of V are computed, and S is m-by-m.

And then if you take what @user3348782 said you lose 1 degree of freedom when you do PCA and the result is a m-by-(m-1) matrix.

Matt L.
  • 739
  • 3
  • 10