Is it always better to extract more factors when they exist?

Question

Unlike principal components analysis, the solutions to factor analysis models are not necessarily nested. That is, the loadings (for example) for the first factor won't necessarily be identical when only the first factor is extracted vs. when the first two factors are.

With that in mind, consider a case where you have a set of manifest variables that are highly correlated and (by theoretical knowledge of their content) should be driven by a single factor. Imagine that exploratory factor analyses (by whichever metric you prefer: parallel analysis, scree plot, eigen values >1, etc.) strongly suggests that there are $2$ factors: A large primary factor, and a small secondary factor. You are interested in using the manifest variables and the factor solution to estimate (i.e., get factor scores) participants' values for the first factor. In this scenario, would it be better to:

Fit a factor model to extract only $1$ factor, and get factor scores (etc.), or
fit a factor model to extract both factors, get factor scores for the factors, but throw away / ignore the scores for the second factor?

For whichever is the better practice, why? Is there any research on this issue?

One should not rely only on the pre-analytic heuristic devices when choosing the number of factors to extract. Reproduction of correlations (how much better is it when you extract 2 factors instead of 1?) How are the correlation residuals distributed in this and that solutions? (they normally should be about uniform or normal, no long/fat right tail). If data are normal, tests of fit and st.errors of loadings are computable (with ML extraction). Based on all that plus interpretability, one might decide whether (1) or (2) way is better in the current case. — ttnphns, Oct 08 '15 at 20:29
(cont.) Ultimately, only new samples/confirmatory FA can judge the dilemma to the end. One notion, however. _If_ the 2nd factor is really weak (small SS loadings after extraction) then I do not expect the two solutions (and hence factor scores of factor 1) to differ greatly. (I'm saying it without much confidence because I'm commenting without overhaul. But, logically, if the factor plane is ready to degenerate into line, the results should be nearly as with just line...) — ttnphns, Oct 08 '15 at 20:40
The Q title `Is is always better to extract more factors when they exist?` is not very clear. It is always better to extract as many as there exist. Underfitting or overfitting both distort "true" latent structure due to the multivariate and non-nested nature of the analysis mentioned by you. The problem is that we don't know exactly how many factors there is in our data. And whether these data have as many as the population has. — ttnphns, Oct 08 '15 at 20:51
@ttnphns, your last comment gets to the heart of the question, I think. Assume whatever methods you like to convince you that there really are 2 factors, 1 of which accounts for almost all the shared variance, up to & including CFA on a fresh sample. The fit w/ 2 is negligibly better, but better. This is a fake & contrived example for the sake of highlighting the issue. The underlying issue could just as well be using 2 out of 5. — gung - Reinstate Monica, Oct 08 '15 at 21:51
The question is, since the solutions are not nested, which approach gives you a better estimate of each participant's score on the latent variable, & why? Is using only 1 biased, does it vary further from the true value, or both? Does that happen because using only 1 is "underfitting"? What does that mean exactly? Is it possible to characterize the nature of the distortion? Alternatively, I might have expected that extracting only 1 allows the analysis to focus all of its degrees of freedom on getting the 1st as accurate as possible. — gung - Reinstate Monica, Oct 08 '15 at 21:53
`which approach gives you a better estimate of each participant's score on the latent variable` I never have a clue to assess that because true factor values are [not estimable](http://stats.stackexchange.com/a/126985/3277). FA does not seek to estimate its [model](http://stats.stackexchange.com/a/94104/3277) directly because, due to the many unique factors, it is excessive in parameters. What FA only can and what it does is _fitting correlation matrix_. So, up to loadings (coefficients `a` in the model) it is estimable. But factor values `F` in the model remain unclear. — ttnphns, Oct 08 '15 at 23:27
(cont.) We can then surrogate them by approximate reasonable devices called "factor scores" correlating with them, and that is all. Factor scores are thus in no relation to some mythical true factor pre-existing any extraction, they only pertain to the factor-as-we-extracted-it, the loadings, which in turn depend on the number of factors we choose to extract. The question of exploratory FA goodness-of-fit or prediction is solved on the level of correlation matrix, not level of individuals (what is different from regression). — ttnphns, Oct 08 '15 at 23:27
(cont.) In short, there can be no better or worse factor score when judging from inside FA (i.e. without extraneous criterions); but there can be better of worse factors and their sets: how strongly/evenly they restore correlations (scores are not needed here) and how nicely they can be interpreted. — ttnphns, Oct 08 '15 at 23:28
(cont.) I'm summarizing what I've said in the 3 last comments schematically as the chain of relations: `FS--EF-?-TF--[FV]` where `FS=f.scores; EF=estimated(extracted) factors; TF=true("real") factor; FV=its f.values`. `[.]` indicates that the bracketed is a [noumenon](https://en.wikipedia.org/wiki/Noumenon). `?` indicates where the fit (reconstruction) nests. Note, that according to the scheme, `FS` and `FV` cannot contact. If you think, maybe, that FA is something like `EF--FS-?-FV--TF`, you are mistaken. — ttnphns, Oct 09 '15 at 06:30
+1, interesting question. My feeling is that it is rather academic though, as in most cases I would expect factor analysis to yield "almost nested" solutions, with the difference between one-factor and two-factor solutions being pretty much negligible. This is, by the way, easy to confirm with simulations (I did some quick simulations with Gaussian data). This made me think about how to construct an example where one-factor and two-factor solutions would strongly differ, but so far I have no idea how to do it. CC to @ttnphns. — amoeba, Oct 09 '15 at 23:40
I too, see the question as interesting. It seems to be looking deep in theory rather than practice. — ttnphns, Oct 10 '15 at 12:33

philchalmers · Accepted Answer · 2015-10-12T13:30:53.987

The issue you're alluding to is the 'approximate unidimensionality' topic when building psychological testing instruments, which has been discussed in the liturature quite a bit in the 80's. The inspiration existed in the past because practitioners wanted to use traditional item response theory (IRT) models for their items, and at the time these IRT models were exclusively limited to measuring unidimensional traits. So, test multidimensionality was hoped to be a nuisance that (hopefully) could be avoided or ignored. This is also what led to the creation of the parallel analysis techniques in factor analysis (Drasgow and Parsons, 1983) and the DETECT methods. These methods were --- and still are --- useful because linear factor analysis (what you are referring to) can be a decent limited-information proxy to full-information factor analysis for categorical data (which is what IRT is at its core), and in some cases (e.g., when polychoric correlations are used with a weighted estimator, such as WLSMV or DWLS) can even be asymptotically equivalent for a small selection of ordinal IRT models.

The consequences of ignoring additional traits/factors, other than obviously fitting the wrong model to the data (i.e., ignoring information about potential model misfit; though it may of course be trivial), is that trait estimates on the dominant factor will become biased and therefore less efficient. These conclusions are of course dependent on how the properties of the additional traits (e.g., are they correlated with the primary dimension, do they have strong loadings, how many cross-loadings are there, etc), but the general theme is that secondary estimates for obtaining primary trait scores will be less effective. See the technical report here for a comparison between a miss-fitted unidimensional model and a bi-factor model; the technical report appears to be exactly what you are after.

From a practical perspective, using information criteria can be helpful when selecting the most optimal model, as well as model-fit statistics in general (RMSEA, CFI, etc) because the consequences of ignoring multidimensional information will negatively affect the overall fit to the data. But of course, overall model fit is only one indication of using an inappropriate model for the data at hand; it's entirely possible that improper functional forms are used, such as non-linearity or lack of monotonicity, so the respective items/variables should always be inspected as well.

See also:

Drasgow, F. and Parsons, C. K. (1983). Application of Unidimensional Item Response Theory Models to Multidimensional Data. Applied Psychological Measurement, 7 (2), 189-199.

Drasgow, F. & Lissak, R. I. (1983). Modified parallel analysis: A procedure for examining the latent-dimensionality of dichotomously scored item responses. Journal of Applied Psychology, 68, 363-373.

Levent Kirisci, Tse-chi Hsu, and Lifa Yu (2001). Robustness of Item Parameter Estimation Programs to Assumptions of Unidimensionality and Normality. Applied Psychological Measurement, 25 (2), 146-162.

Thank you for adding this. This seems to be just what I'm after. — gung - Reinstate Monica, Oct 11 '15 at 23:21
Do I understand correctly that your answer to the title question is "Yes"? — amoeba, Oct 11 '15 at 23:37
@amoeba generally, I would say yes, or more that including the extra information should do as well or better than imposing strict unidimensionality. Ignoring known multidimensionality can be very problematic, but of course a number of factors will contribute to this. The only time including the extra information about the structure might be bad is when the sample size is too small to stably estimate the extra parameters; so, bias-efficiency trade-off. But, if sample size isn't much of an issue then I would say there is little to lose from including extra information (but lots to lose if not). — philchalmers, Oct 12 '15 at 03:49

score 1 · Answer 2 · answered Oct 08 '15 at 20:23

If you truly do not want to use the second factor, you should just use a one-factor model. But I am puzzled by your remark that the loadings for the first factor will change if you use a second factor.

Let's deal with that statement first. If you use principal components to extract the factors and do not use factor rotation, then the loadings will not change -- subject perhaps to scaling (or complete flipping: If $x$ is a factor, then $-x$ is a legitimate way to express it as well). If you use maximum likelihood extraction and/or factor rotations, then the loadings may depend on the number of factors you extracted.

Next, for the explanation of the effects of rotations. I am not good at drawing, so I will try to convince you using words. I will assume that your data are (approximately) normal, so that the factor scores are approximately normal also. If you extract one factor, you get a one-dimensional normal distribution, if you extract two factors, you get a bivariate normal distribution.

The density of a bivariate distribution looks roughly speaking like a hat, but the exact shape depends on scaling as well as the correlation coefficient. So let's assume that the two components each have unit variance. In the uncorrelated case, you get a nice sombrero, with level curves that look like circles. A picture is here. Correlation "squashes" the hat, so that it looks more like a Napoleon hat.

Let's assume that your original data set had three dimensions and yu want to extract two factors out of that. Let's also stick with normality. In this case the density is a four-dimensional object, but the level curves are three-dimensional and can at least be visualized. In the uncorrelated case the level curves are spherical (like a soccer ball). In the presence of correlation, the level curves will again be distorted, into a football, probably an underinflated one, so that the thickness at the seams is smaller than the thickness in the other directions.

If you extract two factors using PCA, you completely flatten the football into an ellipse (and you project every data point onto the plane of the ellipse). The unrotated first factor corresponds to the long axis of the ellipse, the second factor is perpendicular to it (i.e., the short axis). Rotation then chooses a coordinate system within this ellipse in order to satisfy some other handy criteria.

If you extract just a single factor, rotation is impossible, but you are guaranteed that the extracted PCA factor corresponds to the long axis of the ellipse.

I am puzzled by this answer. The question explicitly asks about factor analysis, *as opposed to* principal component analysis. — amoeba, Oct 08 '15 at 20:48
There are two ways to extract factors: Principal components, or maximum likelihood. I have not done any statistics on this, but I believe the principal component method is used more often. — user3697176, Oct 08 '15 at 21:20
There are lots of different methods, more than two. Principal axis, ML, minres, weighted least squares, and more -- I am not an expert here. PCA is perhaps sometimes (rarely!) also considered a method of factor extraction, but that's quite sloppy -- it really should not be. It fits a different model. — amoeba, Oct 08 '15 at 21:28
Your 1st sentence addresses my Q. It would be nice to hear more about that & why it might be right. Regarding methods to extract factors, @amoeba is right: PCA & PAF were common back when other algorithms were not as well developed or difficult to implement. They are now widely considered inferior. R's `fa()` eg has not used them for years. Other methods will yield non-nested solutions, which is easy to verify w/ software & a FA dataset. For the sake of comparability, you can consider both solutions unrotated. FWIW, I am familiar w/ the idea of spherical & elliptical MVN distributions. — gung - Reinstate Monica, Oct 08 '15 at 22:10
@gung, a remark. PAF method also gives non-nested solutions. It is a bona fide FA method (albeit based on PCA as a method) and, I suppose, is still widely used. — ttnphns, Oct 08 '15 at 23:43

Erik Ruzek · Answer 3 · 2015-10-09T02:57:05.963

Why would you not use something like lavaan or MPlus to run two models (unidimensional model and a two dimension model aligned to your EFA results) and compare the relative and absolute fit indices of the different models (i.e., information criteria - AIC and BIC, RMSEA, SRMR, CFI/TLI)? Note that if you go down this road you would not want to use PCA for the EFA, but rather principal factors. Somebody really concerned with measurement would embed the CFA into a full structural equation model.

Edit: The approach I'm asking you to consider is more about figuring out how many latent variables actually explain the set of items. If you want to get the best estimate of the larger factor, I would vote for using the factor scores from the CFA model with the better fit, whichever that is.

Is it always better to extract more factors when they exist?

3 Answers3