Difference between IRT and EFA to find factors

Question

I am learning about Item Response Theory in which items are used to assess ability. In principle, multiple latent abilities may exist and some items test the one, while other items test another. This is similar to Factor Analysis and EFA can be used to discover how many latent abilities exist.

Could someone explain to me the connection between IRT and EFA and, in particular, explain in what way they are different.

As a example of how similar yet different the two approaches are, consider the following example in R in which I generate 2 factors each with 3 partially correlated items. Why are the results not identical?

library(MASS)
library(mirt)
library(psych)

set.seed(5)
Sigma <- matrix(c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow=3, ncol=3)
mu <- c(0, 0, 0)
values_1 <- mvrnorm(n=100, mu=mu, Sigma=Sigma)
values_2 <- mvrnorm(n=100, mu=mu, Sigma=Sigma)
data <- cbind(values_1, values_2)
colnames(data) <- c("v1", "v2", "v3", "v4", "v5", "v6")

cor(data)
#            v1          v2           v3          v4          v5           v6
#v1  1.00000000  0.47780822  0.467372048 -0.03945821 -0.12578462 -0.039831938
#v2  0.47780822  1.00000000  0.439688257  0.05177551 -0.05475685  0.059792099
#v3  0.46737205  0.43968826  1.000000000 -0.03951421 -0.04625965 -0.005905009
#v4 -0.03945821  0.05177551 -0.039514211  1.00000000  0.52541143  0.548299469
#v5 -0.12578462 -0.05475685 -0.046259648  0.52541143  1.00000000  0.540397739
#v6 -0.03983194  0.05979210 -0.005905009  0.54829947  0.54039774  1.000000000

model <- mirt(data, 2)
#Iteration: 61, Log-Lik: -535.353, Max-Change: 0.00010

summary(model, rotate="varimax")
#
#Rotation:  varimax 
#
#Rotated factor loadings: 
#
#         F1        F2    h2
#v1 -0.16213 -0.800794 0.668
#v2  0.05058 -0.712119 0.510
#v3  0.00323 -0.786802 0.619
#v4  0.68827  0.030944 0.475
#v5  0.74845  0.113504 0.573
#v6  0.78273  0.000103 0.613
#
#Rotated SS loadings:  1.675 1.781 
#
#Factor correlations: 
#
#   F1 F2
#F1  1  0
#F2  0  1

fa(data, nfactors=2, rotate="varimax")
#Factor Analysis using method =  minres
#Call: fa(r = data, nfactors = 2, rotate = "varimax")
#Standardized loadings (pattern matrix) based upon correlation matrix
#     MR1   MR2   h2   u2 com
#v1 -0.08  0.71 0.52 0.48   1
#v2  0.04  0.68 0.46 0.54   1
#v3 -0.03  0.65 0.42 0.58   1
#v4  0.73  0.01 0.53 0.47   1
#v5  0.72 -0.09 0.53 0.47   1
#v6  0.75  0.03 0.57 0.43   1
#
#                       MR1  MR2
#SS loadings           1.63 1.40
#Proportion Var        0.27 0.23
#Cumulative Var        0.27 0.50
#Proportion Explained  0.54 0.46
#Cumulative Proportion 0.54 1.00        

summary(model, rotate="oblimin")
#
#Rotation:  oblimin 
#
#Rotated factor loadings: 
#
#        F1      F2    h2
#v1 -0.1180 -0.7949 0.668
#v2  0.0909 -0.7188 0.510
#v3  0.0475 -0.7909 0.619
#v4  0.6902 -0.0110 0.475
#v5  0.7460  0.0683 0.573
#v6  0.7869 -0.0478 0.613
#
#Rotated SS loadings:  1.676 1.781 
#
#Factor correlations: 
#
#      F1    F2
#F1 1.000 0.116
#F2 0.116 1.000

fa(data, nfactors=2, rotate="oblimin")
#Factor Analysis using method =  minres
#Call: fa(r = data, nfactors = 2, rotate = "oblimin")
#Standardized loadings (pattern matrix) based upon correlation matrix
#     MR1   MR2   h2   u2 com
#v1 -0.05  0.71 0.52 0.48   1
#v2  0.06  0.68 0.46 0.54   1
#v3 -0.01  0.65 0.42 0.58   1
#v4  0.73  0.02 0.53 0.47   1
#v5  0.72 -0.08 0.53 0.47   1
#v6  0.76  0.04 0.57 0.43   1
#
#                       MR1  MR2
#SS loadings           1.63 1.40
#Proportion Var        0.27 0.23
#Cumulative Var        0.27 0.50
#Proportion Explained  0.54 0.46
#Cumulative Proportion 0.54 1.00

I'm comparing the exploratory IRT and the traditional EFA estimates. They both identify two factors/latent traits with 3 items loading on each. The magnitudes of the factor loadings are pretty similar. The exploratory IRT estimates have the signs flipped, which I can't really explain. I would say that they are more similar than different. — Weiwen Ng, Oct 25 '19 at 14:32
A point of information: IRT is designed for binary or ordinal questions. I am not that familiar with R, but it looks like in your data generating process, you are creating multivariate normal (i.e. continuous) items. I don't think this is really IRT anymore. I don't know the default behavior of the `mirt` package when it's fed continuous items. You may want to explain more. — Weiwen Ng, Oct 25 '19 at 14:33
True, I should really compare binary or polytonous data instead of normallly distributed data. — LBogaardt, Oct 25 '19 at 16:01
In that case, `mirt` has some functions to generate data from a specified IRT model with polytomous indicators (or binary ones). I am not familiar with the `psych` package, but it should either use a polychoric correlation matrix in factor analysis, or you should be able to tell it to do so. I would ensure that this is done if you want to update your code. Whatever the case, it would help readers if you annotated the code more; some of us will be familiar with R, but not all may be, or we may not be familiar with some functions or packages. — Weiwen Ng, Oct 25 '19 at 16:04

Weiwen Ng · Accepted Answer · 2019-10-25T16:01:45.230

A better question might be: compare exploratory factor analysis (EFA) to confirmatory factor analysis (CFA). IRT, as I will explain later, can be thought of as a type of CFA.

Exploratory vs. Confirmatory Factor Analysis

In EFA, you take a bunch of items (or questions, variables, etc), and you ask the computer to determine the structure. No a priori structure is supplied aside from correlated vs. uncorrelated factors. The program finds all the possible factors for you. There are a bunch of criteria to determine the number of latent traits (i.e. factors). Traditional criteria include create a scree plot and retain all factors above the elbow, or retain factors with eigenvalues > 1.

In CFA (remember, IRT is a type of CFA), you tell the computer that you know the structure already, including how many latent traits there are, and which items load on which traits. You don't find the factors in CFA, you set the structure based on some a priori knowledge. The computer will fit the model, determine the loadings, intercepts, and global fit statistics. You can then supply an alternative structure to your computer. You can then compare the global fit statistic to the first model, or (I believe) you can do something like a likelihood ratio test or compare BIC.

For example, you could assume unidimensionality, and then fit some sort of multidimensional model for comparison. You can choose an alternative model based on expert knowledge; for example, see this article by Belanger and colleagues. Alternatively, I think there are some criteria in CFA that can lead you to an alternative model (e.g. error correlations?). I'm less familiar with these.

Edit: you asked in comments, how do I go from a traditional unidimensional IRT model to some alternative structure? How would I know what alternative structures to explore? You asked if you could use EFA, and I believe that this would be acceptable.

However, we don't often get a bunch of questions with absolutely no a priori knowledge of what factors might exist. We often have some sense of possible dimensions. Presumably, we would have got this sense during scale development. For example, this article by Rosalie Kane et al (disclosure: she's my late advisor's wife, and she helped arrange some funding for me during my PhD program) goes through the development process of a multi-dimensional scale for quality of life in nursing homes. It started from the ground up with qualitative interviews of residents and long-term care experts.

Anyway, one will often have some sense of an alternative structure that's a bit different from the original; for example, if you read Belanger et al's work on the PHQ-9 depression questionnaire, you'll see that they treated the PHQ-9 as unidimensional, and then tried a slightly different specification based on other research. That alternative specification had 3 of the 9 items treated as somatic symptoms of depression. The remaining items are affect/cognitive symptoms.

IRT is a form of (generalized) CFA

Now, CFA appears to be based on linear regression. I think that normally, CFA assumes that the items are continuous and multivariate normal, although CFA estimators are available that are robust to departures from this condition (e.g. weighted least squares). As an alternative, you can specify a generalized CFA, where you assume the items are distributed, for example, logistic, ordinal logistic, Poisson, etc.

Item response theory is simply a non-linear CFA with binary or ordinal response functions as appropriate. IRT does have different identifying constraints (see below) than traditional CFA, but it's still fundamentally a type of CFA. Most IRT models are unidimensional ones, but you can fit various types of multidimensional IRT models as well. Both CFA and EFA have factor loadings; in IRT, the factor loading is the discrimination parameter.

Identifying Constraints in IRT vs. CFA and the impact on the loading/discrimination parameter

Interested readers note: in traditional CFA models, one item has its loading (same as IRT discrimination) arbitrarily constrained to 1. This is for identification purposes. You cannot identify all the factor loadings otherwise. Please Google for more info.

IRT makes a different arbitrary constraint: the variance of the latent trait is constrained to 1. In both traditional CFA and IRT, I believe the latent treat is assumed to be normally distributed with a mean of 0; in CFA, you estimate the variance, in IRT, you constrain it.

Had you fit an IRT model vs. a non-linear CFA, the loading parameters would thus not quite be comparable. They're conceptually the same thing, they're just no longer comparable.

Misc Comments

I note that the mirt R package contains an exploratory IRT function, which you invoked. I am not familiar with this application of IRT, and I will defer to Phil Chalmers, the package author, to explain. He often posts here.

Both CFA and EFA have factor loadings; in IRT, the factor loading is the discrimination parameter. If you haven't noticed yet, both your EFA your exploratory IRT identify two latent traits. In the EFA, I believe that all items load on all traits/factors, but items often have a clear primary factor they load on. You can see that both your EFAs put the first three items in one factor, and the second three in a second factor, as did your exploratory IRT. In CFA and IRT, each item loads only on the factor you specify. Had you run a two factor CFA (linear or non-linear), you'd probably have told it that each group of three items loaded on only one factor, based on those EFA results.

Thanks for your detailed answer. Another question though: if IRT is more similar to CFA, how does one find the number of factors without a priori knowledge? Can one first do an EFA and use standard methods such as a scree plot to determine this? — LBogaardt, Oct 25 '19 at 07:51
@LBogaardt yes, you can do an EFA. However, how often do you receive a bunch of questions with absolutely no indication of what factors might exist? We would usually get some sense of this during the measure development process. For example, in depression, we know that the symptoms can be related to affect (I feel sad), cognition (I feel worthless), or somatic symptoms (I’m much more tired than normal) — Weiwen Ng, Oct 25 '19 at 14:14

Difference between IRT and EFA to find factors

1 Answers1

Exploratory vs. Confirmatory Factor Analysis

IRT is a form of (generalized) CFA

Identifying Constraints in IRT vs. CFA and the impact on the loading/discrimination parameter

Misc Comments

Linked