Gene data has large number of dimensions as compared to samples. This leads to a non-positive-definite covariance matrix. In R when I try to use princomp
which does the eigendecomposition of covariance matrix, it complains that sample size should be larger than dimensions. Whereas prcomp
works fine since it performs SVD. That is understood.
There has been a lot of research on estimation of covariance and inverse covariance matrix for $p\gg n$ problems. I am trying to figure out, what is the exact effect on "principal components" obtained by eigendecomposition of non-positive-definite matrix and how it can affect:
- Clustering
- Classification