2

I'm kind of a noob to EFA and am trying to use the FANode object in Python. This is from the MDP library. I am using it on survey data to see which variables are tied together. Whenever I run it on my data, I get the following error:

mdp.NodeException: The covariance matrix of the data is singular. Redundant dimensions need to be removed.

I was wondering if anyone else has experienced these errors? What do they mean to you? Again, I am quite the noob and would prefer to ask the community before diving into a text book.

Andre Silva
  • 3,070
  • 5
  • 28
  • 55
  • Although some implementations of FA can cope with complete multicollinearity (singularity), that would be a palliative trick. _Theoretically_ FA requires nonsingularity. It assumes that _each variable_ has its _independent_ part, which is also independent of the common factor(s). (That part is [called unique factor](http://stats.stackexchange.com/a/95106/3277)). This assumption cannot hold under singularity. – ttnphns Jul 15 '14 at 20:31

2 Answers2

3

Covariance matrix of the data being singular means that some variables in your data set are linear functions of one another. Most typically, this is a full set of dummy variables corresponding to a categorical factor. You put categorical data into your tags, but you did not describe how exactly it shows up in your EFA. Technically speaking, categorical data violates assumptions of EFA (multivariate normal data), so you will probably need to modify your analysis somehow.

The error message, however, speaks of a somewhat poor implementation of EFA. There are EFA methods that can get away with a degenerate matrix, although of course it makes life harder for the methods that rely on inverses and determinants of the covariance matrix of the observed variables. A better implementation should crank through it with a warning.

StasK
  • 29,235
  • 2
  • 80
  • 165
2

That's a surprisingly clear error message. As it says, redundant dimensions need to be removed. Factor analysis doesn't work on extremely collinear variables. For example, consider this data frame:

1    1      1      1
2    2      2      2
3    3      3      3
4    4      4      4
5    5      5      5
6    6      6      6
7    7      7      7
8    8      8      8
9    9      9      9

All covariances for these data are equal (7.5). Therefore the determinant of the covariance matrix is 0, the matrix is not invertible (i.e., is singular), and factor analysis cannot proceed. This can be much less obvious in real datasets, but generally entails that one or more of your variables is/are so strongly related to others as to be completely redundant. Factor analysis should generally be performed on variables that are at least slightly distinct from one another, though smoothing methods exist to help handle extreme collinearity under certain circumstances.

If by survey data you mean Likert scale ratings, see also Factor analysis of questionnaires composed of Likert items. There will be a lot of other issues to consider after addressing the singularity problem.

Nick Stauner
  • 11,558
  • 5
  • 47
  • 105