I have a data $1600\times5000$ matrix $X$ containing 1600 datapoints in 5000-dimensional space. Using MATLAB's built-in pca
function, I get the loadings in coeff
.
In theory, coeff*coeff'
should give us a almost-indentity matrix. For example:
coeff = pca(rand(1000,1000));
coeff*coeff';
However, in my case, coeff*coeff'
is far away from identity, with some of the diagonal entries as low as 0.01. As a result, if I wish to reconstruct my data points, even with all the PCs, I worry that the results may be lousy.
What is the possible explanation for this? And is there a way I can get around this problem?