11

I have a set of 20 variables that I have put through factor analysis in SPSS. For purposes of the research, I need to develop 6 factors. SPSS has shown that 8 variables (out of 20) have been loaded with low weights or have been loaded equally by several factors, so I have removed them. The remaining 12 variables have been loaded in pairs of 2 in the 6 factors, which is perfect structure -- just as I wanted, but now, one of the professors working with me wants me to find justification why (or under what conditions) it is appropriate to keep only 2 items per factor, since it is commonly known that factor analysis is useful with results 3 or more items loaded, per factor.

Can anyone help me out with this issue, preferably with a published reference as well?

ttnphns
  • 51,648
  • 40
  • 253
  • 462
Mitja
  • 119
  • 1
  • 1
  • 3
  • "At least 3 items per factor" is a warranted recommendation. If you have, after factor rotation, results with 2 or one items in a factor, either 1) get more variables which you expect to be loaded by that factor, or 2) redo the analysis and extract less factors, or 3) leave the results as is but don't interpret the "needy" factor, saying "I believe that factor exists, but since it isn't currently supported by items enough I drop it from interpretation and from results". All these 2 recommendations are different, though. – ttnphns Aug 21 '17 at 07:44
  • See also, in addition to the answers here, stats.stackexchange.com/a/198684/3277 (poit 5) why "At least 3 loaded items per factor" is reasonable requirement. – ttnphns Aug 21 '17 at 08:06
  • A single item factor is also acceptable if that item has a higher factor loading. – Meera Gang Dec 18 '16 at 15:54

3 Answers3

15

Two or three items per factor is a question of identification of your CFA (confirmatory FA) model.

Let us for simplicity assume that the model is identified by setting the variance of each factor to 1. Assume also that there are no correlated measurement errors.

A single factor model with two items has two loadings and two error variances to be estimated = 4 parameters, but there are only 3 non-trivial entries in the variance-covariance matrix, so you don't have enough information to estimate the four parameters that you need.

A single factor model with three items has three loadings and three error variances. The variance-covariance matrix has six entries, and careful analytic examination shows that the model is exactly identified, and you can algebraically express the parameter estimates as functions of the variance-covariance matrix entries. With more items per single factor, you have an overidentified model (more degrees of freedom than parameters), which usually means you are good to go.

With more that one factor, the CFA model is always identified with 3+ items per each factor (because a simple measurement model is identified for each factor, so roughly speaking you can get predictions for each factor and estimate their covariances based on that). However, a CFA with two items per factor is identified provided that each factor has a non-zero covariance with at least one other factor in population. (Otherwise, the factor in question falls out of the system, and a two-item single factor model is not identified.) The proof of identification is rather technical, and requires good understanding of matrix algebra.

Bollen (1989) fully and thoroughly discusses the issues of identification of CFA models in chapter 7. See p. 244 specifically regarding three- and two-indicator rules.

ttnphns
  • 51,648
  • 40
  • 253
  • 462
StasK
  • 29,235
  • 2
  • 80
  • 165
  • 1
    This was a very apt answer. I would only comment (for the OP's sake) that the OP asked about exploratory FA (EFA). It is logical that EFA should have "3+ loaded items per factor" since CFA expects it; only that you didn't say about it in your answer. – ttnphns Aug 21 '17 at 08:03
4

I never heard about the "3 items per factor" criterium. I would reverse the question and ask your professor to come up with a sound reference for this statement.

Besides that, "for purposes of the research, I need to develop 6 factors." is a weird thing to say.

The basic purpose of factor analysis is 1) find out how many factors (often psychological traits) underlie a (larger) number of measured variables. Then 2), based on the factor loadings, one tries to describe what these factors really are.

You don't "develop" 6 factors, you're "trying to measure" 6 factors.

However, cross loadings (variables loaded by several factors) present are often an indication that the factors are "trying to correlate" with each other. Which makes sense since we know that basically everything correlates with everything in the real world. Implementing this observation in your analysis by using an oblique (instead of the orthogonal varimax) rotation often gets rid of many cross loadings. IMHO, it is more sound theoretically too.

Give that a shot, you may end up with more items per factor. That may (partly) solve your problem too.

ttnphns
  • 51,648
  • 40
  • 253
  • 462
RubenGeert
  • 605
  • 1
  • 5
  • 11
  • Thank you very much for your comment, why six factors I can explain with a model that I am using, my professor is not against 6 factor explanation, however he wants explanation when is it OK to use factor analysis that has only 2 items per factor. This still remains the question. – Mitja Nov 26 '12 at 17:08
  • Welcome to the site, @pythonforspss.org, there's a lot of good info here, +1. A couple of notes: I have heard it said several times that you need at least 3 variables per factor, but I don't know what the (or if there actually is any) substantive reason for this rule. I edited the OP's Q to make the English smoother; I put in the phrase you quote to replace what was there beforehand. This may well not have been ideal (I wasn't sure how to translate what I thought the OP might be trying to say), but if so it's my fault, not Mitja's. Remember that English is not the 1st language of many users. – gung - Reinstate Monica Nov 26 '12 at 17:35
  • the three items per factor is a common belief, and tends to cause problems at review stage (as it is a common belief). That being said, if your communalities are high (>0.7) then you probably don't have an issue. – richiemorrisroe Nov 26 '12 at 17:40
  • My communalities are 0.5 or higher ... – Mitja Nov 26 '12 at 19:46
  • `factors are "trying to correlate" with each other` is a mystic formulation. Factors correlate or don't correlate according to how we rotate (model) them. Quite high "cross-loadings" are possible with orthogonal factors with a variable having high communality. – ttnphns Aug 21 '17 at 08:07
1

I have the same problem now. Here is an article which recommends to use at least 3 items per factor. In exceptional cases, however, you might use to items per factor (p.60). http://www.sajip.co.za/index.php/sajip/article/download/168/165 My case seems to be exceptional, since there are only two variables in my web-based experiment, that provide information on player's strategy and strategy power. May be it could help you too to legitimize the use of 2 items for some factors.

Momo
  • 8,839
  • 3
  • 46
  • 59
eugentango
  • 11
  • 1
  • 1
    This website has a number of references supporting the minimum of three variables per factor rule: https://www.encorewiki.org/display/~nzhao/The+Minimum+Sample+Size+in+Factor+Analysis –  Sep 16 '14 at 18:48