Recommended procedure for factor analysis on dichotomous data with R

Question

I have to run a factor analysis on a dataset made up of dichotomous variables (0=yes, 1= no) and I don´t know if I'm on the right track.

Using tetrachoric() I create a correlation matrix, on which I run fa(data,factors=1). The result is quite near to the results i receive when using MixFactor, but it´s not the same.

Is this ok or would you recommend another procedure?
Why does fa() work and factanal() produce an error? (Fehler in solve.default(cv) : System ist für den Rechner singulär: reziproke Konditionszahl = 4.22612e-18)

I don't speak (German?) but the error looks like it's due to the tetrachoric matrix being singular (non-invertible). Even with a good sized sample some estimates of polychoric correlation matrices can fail to be proper correlation matrices. My hunch is that fa() is using principal components and factanal() is doing maxmimum likelihood FA, but that's not obvious to me from the documentation for fa(). — JMS, May 30 '11 at 15:59
@cada How many items/subjects do you have? And what method is implemented in MixFactor? — chl, May 30 '11 at 15:59
@cada why are you running the factor analysis? If you are aiming to estimate participant ability, IRT may be a better approach. on the matter of fa versus factanal, fa uses minimum residual by default, while factanal uses an ML method. — richiemorrisroe, May 30 '11 at 17:43
@chl: way too few! I just started the survey and by now there are just 45 subjects. I cant collect new data in teh moment and that´s why I try to do as much of the r code as possible... — cada, May 31 '11 at 07:58
@cada If you have more items than subjects, then you cannot use ML-based FA. Even if this is not the case, you might have a ratio of subjects/items in the order of 10 at least to get meaningful results with ML. This also depends on the number of dimensions present in the data. Forget about IRT models with 45 subjects, unless you're willing to use MCMC. — chl, May 31 '11 at 08:00
@chl: ok, that means with my 22 Items-Test I should go for at least 220 subjects, right? (i talked about this problem with my supervisor before, but he said it would be ok.... i don´t think i´m going to have more than 100 subjects). — cada, May 31 '11 at 08:04
MixFactor uses "mixed Intercorrelationmatrices" (don´t know if i translated it right) depending on the single variables. In the case of dichotomous variables it calculates also tetrachoric correlations. What happens afterwards isn´t said in the documenation... — cada, May 31 '11 at 08:08
@cada Any reason to hypothesize that your scale is strictly unidimensional (I see that you ask for 1 factor only)? — chl, May 31 '11 at 09:43
@chl: yes, it´s a validation of a questionnaire which showed an unidimensional structure in the studies before. I think i could also try a confirmatory FA, but i´ve never done this before and i definately have to check this out in more detail first. — cada, May 31 '11 at 11:29
@cada A CFA with 45 subjects would be surrealist :-) I'll try to add my thoughts later. — chl, May 31 '11 at 11:47

score 12 · Answer 1 · edited Mar 24 '21 at 12:47

To sum up, with n=45 subjects you're left with correlation-based and multivariate descriptive approaches. However, since this questionnaire is supposed to be unidimensional, this always is a good start.

What I would do:

Compute pairwise correlations for your 22 items; report the range and the median -- this will give an indication of the relative consistency of observed items responses (correlations above 0.3 are generally thought of as indicative of good convergent validity, but of course the precision of this estimate depends on the sample size); an alternative way to study the internal consistency of the questionnaire would be to compute Cronbach's alpha, although with n=45 the associated confidence interval (use bootstrap for that) will be relatively large.
Compute point-biserial correlation between items and the summated scale score; it will give you an idea of the discriminative power of each item (like loadings in FA), where values above 0.3 are indicative of a satisfactory relationship between each item and their corresponding scale.
Use a PCA to summarize the correlation matrix (it yields an equivalent interpretation to what would be obtained from a multiple correspondence analysis in case of dichotomously scored items). If your instrument behaves as a unidimensional scale for your sample, you should observe a dominant axis of variation (as reflected by the first eigenvalue).

Should you want to use R, you will find useful function in the ltm and psych package; browse the CRAN Psychometrics Task View for more packages. In case you get 100 subjects, you can try some CFA or SEM analysis with bootstrap confidence interval. (Bear in mind that loadings should be very large to consider there's a significant correlation between any item and its factor, since it should be at least two times the standard error of a reliable correlation coefficient, $2(1-r^2)/\sqrt{n}$.)

Thank you very much! Your answer is so detailed and really really helpful! Thank you! — cada, Jun 07 '11 at 09:43

BurninLeo · Answer 2 · 2013-01-24T08:58:44.373

This thread has a good Google position for the "System ist für den Rechner singulär: reziproke Konditionszahl" error using factanal (in English: "system is computationally singular: reciprocal condition number") - therefore I shall add a comment:

When the correlation matrix is calculated a priori (e.g., to pairwisely delete missing values), make sure that factanal() does not think that the matrix is the data to analze (https://stat.ethz.ch/pipermail/r-help/2007-October/142567.html).

PREVIOUS: matrix = cor(data, use="pairwise.complete.obs")  # For example
WRONG: factanal(matrix, 3, rotation="varimax")
RIGHT: factanal(covmat=matrix, factors=3, rotation="varimax")

BurninLeo

Recommended procedure for factor analysis on dichotomous data with R

2 Answers2

Linked