polychoric factor analysis and use of factor score in subsequent models

Question

To predict the impact of gender egalitarianism on life satisfaction (7-scale ordinal variable), I wanted to create a factor score from a relevant group of variables (mothers should work: agree to disagree - 5 scale; men should have the right to work when jobs are scarce: agree to disagree - 7 scale etc., five variables in total). As these are all ordinal, I decided to go for polychoric factor analysis. So, one factor score was created as a result (please see the analysis below: hope I did it right!), I looked the factor score up in the data browser, and the factor score looks like a continuous variable. Can I directly (without putting an i. in front) add it to my model s an independent/control variable? Can I interpret this as below? Example: Individuals with a more egalitarian attitude towards gender equality are statistically significantly more likely to be satisfied with their lives with the ordered log odds of X.

polychoric motherworks menbusiness housewifebetter menrightjob wommoreinc

Polychoric correlation matrix

. display r(sum_w) 65408

. matrix r = r(R)

. factormat r, n(65146) factors(1) (obs=65,146)

Factor analysis/correlation Number of obs = 65,146 Method: principal factors Retained factors = 1 Rotation: (unrotated) Number of params = 5

--------------------------------------------------------------------------
     Factor  |   Eigenvalue   Difference        Proportion   Cumulative
-------------+------------------------------------------------------------
    Factor1  |      1.53825      1.47170            1.2592       1.2592
    Factor2  |      0.06654      0.07187            0.0545       1.3137
    Factor3  |     -0.00533      0.15521           -0.0044       1.3093
    Factor4  |     -0.16053      0.05682           -0.1314       1.1779
    Factor5  |     -0.21735            .           -0.1779       1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated:  chi2(10) = 5.5e+04 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

---------------------------------------
    Variable |  Factor1 |   Uniqueness 
-------------+----------+--------------
 motherworks |   0.4601 |      0.7883  
 menbusiness |   0.6726 |      0.5476  
housewifeb~r |   0.3166 |      0.8998  
 menrightjob |   0.6983 |      0.5124  
  wommoreinc |   0.5351 |      0.7137  
---------------------------------------

. predict Factor1 (regression scoring assumed)

Scoring coefficients (method = regression)

------------------------
    Variable |  Factor1 
-------------+----------
 motherworks |  0.17576 
 menbusiness |  0.32150 
housewifeb~r |  0.10528 
 menrightjob |  0.35539 
  wommoreinc |  0.21144 
------------------------

(variable means assumed 0; use means() option of factormat for nonzero means) (variable std. deviations assumed 1; use sds() option of factormat to change)

When we do FA based on poly- or tetrachoric correlations we must not use traditional methods of computation factor scores because these methods assume data values directly correspond to the correlations (and hence loadings). Which is not the case here: there is no direct, linear link between the data and the polychoric correlations. Special methods should be used: A posteriori or Expectation maximization methods of factor scores. I don't know the details of these algorithms. Please dig in packages and literature, or this site. If you find the description of the methods, please let us know. — ttnphns, Sep 12 '20 at 21:32
@ttnphns I would *love* a good seminal theory reference on polychoric and/or tetrachoric factor analysis. I have had several requests to integrate such methods into some of my software, but never landed on anything other than applications of and software guides. — Alexis, Sep 12 '20 at 23:58
@Alexis, I very briefly touch the topic [here](https://stats.stackexchange.com/a/215483/3277), but there are local links. EFA on polychoric correlations is not away from usual EFA. The three points to mind though would be (i) polychoric r may "forget" the _multi_ variate information which original r still "remembers", (ii) matrix of polychoric r may need "smoothing" to become p.d., (iii) and this major problem with estimating factor scores since original dataset doesn't correspond to the loadings directly anymore. Otherways it is straightforward. — ttnphns, Sep 13 '20 at 00:16
David Bartholomew was one of those FA experts who wrote a lot on FA of ordinal or binary data. — ttnphns, Sep 13 '20 at 00:23

polychoric factor analysis and use of factor score in subsequent models

0 Answers0