I'm having trouble performing factor analysis on my dataset.
When I perform the factor analysis in SPSS (default settings), it works fine. Problem is, I need to do it programmatically (in Python). When I try using Python (MDP library) to do factor analysis on the same dataset, I get this error:
"The covariance matrix of the data is singular. Redundant dimensions need to be removed"
Upon looking into the MDP documentation, it says "...returns the Maximum A Posteriori estimate of the latent variables." Being a factor analysis newbie, I wasn't too clear on what this meant, but I tried changing the default extraction method in SPSS from "principal components" to "maximum likelihood". Then, in SPSS, I get the error:
"This matrix is not positive definite."
Are these two errors the same thing? Regardless, what can I do to fix my dataset so that the covariance matrix is not singular?
Thanks!
edit: OK, so I was trying to keep things simplified, but perhaps its better to just explain everything from the start.
I have a series of documents. Yes, I'm only using 9 documents as a simple test case, but my final objective will be to use it on a much larger corpus.
I've built a term-document matrix, performed tf-idf, and did SVD-- mostly with the help of blog.josephwilk.net/.../latent-semantic-analysis-in-python.html
Now I have a reconstructed matrix, and I want to sort the documents into categories. So, I tried using factor analysis. In fact, it seems to work-- when I put it in SPSS, the factor loadings indicate that the documents are grouped the way I thought they should be, and the loading are higher than if I hadn't performed SVD. (Although I think technically, SPSS is doing PCA even though its under the 'Factor Analysis' heading).
I tried using MDP's PCANode, but that doesn't seem to give me anything close to what I want. Strangely, if I transpose my matrix, the factor analysis does work (it will group the terms, instead of the documents).
Hopefully this all makes a little more sense now...