I plan to run a series of exploratory factor analysis (EFA) models to investigate the factor structure of a scale in development using the R package psych
. My N > 300 and each manifest variable (i.e., indicator/item) uses a 5-point Likert-type response option. Some items were reverse-coded (to my chagrin) but handled accordingly; there are no missing data. The structure of the data is as follows:
str(dat)
'data.frame': 315 obs. of 33 variables:
Given the scale's domain content (social science/higher ed), I presumed the data were ordinal, but I calculated both the Pearson and polychoric correlations (because science!) first:
library(psych)
library(GPArotation)
#library(corrplot)
corr_list = list(
pearson = cor(dat),
poly = polychoric(dat)$rho)
When running polychoric correlations, a warning message is generated that states a correction for continuity was applied and 526
were adjusted (see here). I understand that polychoric correlations use a table of proportions and since some responses end up being empty, a correction is needed. My first question is with regard to this:
- Are the polychoric correlation coefficients stable (p. 21) given that
526
cells were adjusted? This was a bit perturbing consideringdat
only has 315 obs -- OR is the solution adequate because the adjusted cells only account for about 5% of the overall data structure (315 rows * 33 variables = 10, 359 total cells)?
Different estimation methods for EFAs are available in the psych
package, but the function also has an arg
that requires a correlation call (e.g., "cor"
[Pearson], "poly"
, "mixed"
, etc.); the type of correlation in conjunction with the estimation method can a drastic impact on the solutions provided, so I wanted to gain as much clarity about the correlation output as possible before moving forward. EFA follow-up on the horizon!