8

What approaches are there to perform FA on data that is clearly ordinal (or nominal for that matter) by nature? Should the data be transformed our are there readily available R packages that can handle this format? What if the data is of a mixed nature, containing both numerical, ordinal and nominal data?

The data is from a survey where subjects have answered questions of many types: yes/no; continuous; scales. My aim is to use FA as a method for analyzing the underlying factors. I do not yet know what factors I'm looking for. However, condensing the underlying factors into a manageable number of factors is important.

EDIT: Also, can I approximate a survey question answered on the Likert-type scale as a continuous variable?

Thank you.

Figaro
  • 1,042
  • 2
  • 12
  • 24
  • 2
    If you look at the Related threads on the right hand side there are several that appear to answer very similar (if not identical) questions. Such as this [one](http://stats.stackexchange.com/q/5502/1036) and this [one](http://stats.stackexchange.com/q/11372/1036). – Andy W Jun 14 '11 at 13:31
  • 3
    I've edited your title; the question about nominal data is important in that it's why this question is not a duplicate. – JMS Jun 14 '11 at 13:53
  • @JMS Given that the nominal data would have to be represented as dummy indicators, doesn't that point us right back to the question about [factor analysis of dyadic data](http://stats.stackexchange.com/questions/3006/factor-analysis-of-dyadic-data)? – whuber Jun 14 '11 at 14:18
  • For what it's worth, my vote would be to modify the question to extract the unique elements: i.e., (a) how to factor analyse data that is based on a mixture of different data types, and (b) how to factor analyse nominal data. @whuber I suppose approaches like optimal scaling of nominal data are an alternative to dummy indicators. (@Figaro are you able to update the question to make it a non-duplicate?) – Jeromy Anglim Jun 14 '11 at 14:58
  • @Figaro What would be the purpose of the FA in your context? Compute individual composite scores or identify items clusters? Do the nominal variable define a between-subject factor or a group of variables, or is it just another "measurement"? (I'm thinking to some of the factor-related methods available in the [FactoMineR](http://factominer.free.fr/advanced-methods/) package.) – chl Jun 14 '11 at 15:00
  • @Jeromy Scaling of *ordinal* data makes sense, but how would optimal scaling of purely nominal (categorical) data work? Could you point me to a place where I could learn about this? – whuber Jun 14 '11 at 15:03
  • @Figaro So, what do you want to show? That there's good agreement between responses for people sharing similar characteristics, that we can construct a typology of the respondents given their pattern of responses, that some items tend to hang together, etc.? I would suggest to edit your question directly so as to make your goals with such data more clear. – chl Jun 14 '11 at 15:17
  • @whuber I've encountered CatPCA in SPSS http://mondi.web.elte.hu/spssdoku/algoritmusok/catpca.pdf . It also looks like the `homals` package in R http://cran.r-project.org/web/packages/homals/index.html implements the same or similar methods. – Jeromy Anglim Jun 14 '11 at 15:40
  • @Jeromy Thank you. I see what's going on: internally the nominal variables *are* being represented as dummy indicators via the matrix $G$. CatPCA then "scores" these dummies to maximize their correlations with the dependent variable. That's a brave thing to do, given that it's inherently biased, but I can see its utility in data exploration. – whuber Jun 14 '11 at 16:40
  • @whuber I don't think so; it depends whether the OP strictly wants to do FA. There are "FA-like" models which don't require transforming to dummy variables. Dummy variables aren't a great idea if you have a nominal variable with >2 categories where the categories are exclusive. – JMS Jun 14 '11 at 16:47
  • @JMS I don't think there's a choice, if I read the [SPSS docs](http://www.unt.edu/rss/class/Jon/SPSS_SC/Module9/M9_CatReg/SWPOPT.pdf) correctly. The *user* might not need to use dummies, but the SPSS algorithm does. But maybe I misunderstand what it's doing. – whuber Jun 14 '11 at 16:50
  • 1
    Also, an example of handling nominal data with a discrete-choice type specification is given [here](http://www.intlpress.com/SII/p/2008/1-1/SII-1-1-A9-Cai.pdf), for example. – JMS Jun 14 '11 at 16:50
  • @whuber The OP doesn't mention SPSS; I wasn't referring to it either. – JMS Jun 14 '11 at 16:51
  • @whuber To clarify, my first @ to you was in re: your first @ to me :) – JMS Jun 14 '11 at 16:52
  • I should add that unless @Figaro has nominal data with more than 2 levels I think this is an exact duplicate. – JMS Jun 14 '11 at 16:55
  • @JMS Very interesting approach. (The reference to SPSS came from @Jeromy, by the way.) – whuber Jun 14 '11 at 16:56
  • @whuber that's what I thought; danger of rapid fire commenting :) – JMS Jun 14 '11 at 17:06
  • @JMS. Most of the nominal data has more than 2 levels. 5 to 7 levels on most of these. I do not see this as a direct duplicate. Am I right? I do need to point out that the psychometrics side of these methods (and others, that answer similar questions) are quite unfamiliar to me, and I would to very much stick to the FA. But is is possible? What if the problem is limited to only my nominal variables? – Figaro Jun 14 '11 at 19:04
  • @Figaro I'm not aware of another question here dealing with FA on mixed data that includes categorical variables. Perhaps you could edit your post to include more information on your data, describing the variables, etc? We should be able to give a proper answer then, or point you to some resources. – JMS Jun 14 '11 at 20:33
  • @JMS @Figaro A discussion toward this particular issue began to emerge [here](http://bit.ly/mul43b) or [here](http://bit.ly/kF9sPF), for ex.. It think it is an interesting question because it brings an interesting issue wrt. (a) the domain of measurement (of psy. traits) -- i.e., properly scaling attributes assessed on different scales of measurement, for which we can inherently postulate different liability or threshold models for binary or polytomously-scored items -- and (b) factor-related methods for mixed data types. Anyway, I agree that it would be helpful to get more info on the data. – chl Jun 14 '11 at 20:54
  • Can I approximate a survey question answered on the Likert-type scale as a continuous variable? – Figaro Jun 15 '11 at 06:51

2 Answers2

3

Particularly if you have nominal indicators along with the ordinal & continuous ones, this is probably a good candidate for latent class factor analysis.

Take a look at this -- http://web.archive.org/web/20130502181643/http://www.statisticalinnovations.com/articles/bozdogan.pdf

Silverfish
  • 20,678
  • 23
  • 92
  • 180
dmk38
  • 1,534
  • 10
  • 13
  • Link is broken. Would appreciate if you update it. Thanks – MYaseen208 Nov 13 '15 at 13:44
  • The link should work now, I have added an archived version. – Silverfish Dec 13 '15 at 19:26
  • 1
    Here's an Internet Archive cache of the PDF linked in dmf38's answer (since I can't comment on answers yet): http://web.archive.org/web/20130502181643/http://www.statisticalinnovations.com/articles/bozdogan.pdf –  Dec 13 '15 at 19:13
2

FactoMineR is a nice package for Factor Analysis on mixed variables.

MYaseen208
  • 2,379
  • 7
  • 32
  • 46