4

I have got 8 cognitive (continuous) behaviour variables and would like to combine them into a composite score. I would then like to find the best predictors of this outcome (from about 50 predictors).

I was interested if there are alternatives to PCA/Factor analysis or latent variable model approaches which allow to model features which are non-linearly related to the input variables. I am aware of non-linear PCA but as a classical statistician would be interested if there are any other methods in the field of machine learning.

I would also be interested if it is possible to combine the development of a "composite score" and selecting a regression model to predict this composite score simultaneously within a cross-validation model building procedure.

I am grateful about any advice or references.

Stats_Monkey
  • 203
  • 2
  • 9
  • 2
    Can you explain in more detail why are you not happy with PCA/FA approaches? It is not really clear to me what you mean by *"find[ing] a cross-validate model which maximizes prediction accuracy by modelling both feature variables and development of composite measure simultaneously"*; maybe you could try to elaborate. – amoeba Jan 12 '15 at 21:01
  • Thanks for your reply You are right that my sentence does not make too much sense and I will try to elaborate my point: – Stats_Monkey Jan 13 '15 at 10:11
  • I was interested if there are alternatives to PCA/FA which allow to model more features which are non-linearly related to the input variables. I am aware of non-linear PCA but as a classical statistican would be interested if there are any other methods in the field of machine learning – Stats_Monkey Jan 13 '15 at 10:23
  • General remark: instead (or in addition) to providing clarifications in the comments, I suggest you edit your question to clarify/expand it. This puts it on the front page again and makes more people look at it again. – amoeba Jan 13 '15 at 11:23
  • Good idea! I have update my original question. – Stats_Monkey Jan 13 '15 at 14:03
  • 1
    Why do you need to use machine learning? To work through 50 predictors? To automate a process that will be repeated? I'm no machine learning expert, but from my perspective composite scores, indices, or constructs should be created using a very theory-driven - not data-driven - process. Unless all you are trying to do is data reduction. This is because there is often no way validate most behavioral/psychometric constructs. They are abstract constructs that someone defines with theory. Measures created and assessed for construct validity - but if you have no theory they are meaningless. – robin.datadrivers Jan 13 '15 at 19:32
  • Dear Robin, thanks for your comments. I agree that such a data-driven composite score is not validated and I do nit know what it really measures and perhaps I should have used a different example for my question, where data reduction is the aim. I am mainly interested what kind of approaches the machine learning community is using to see if there is anything to learn from. – Stats_Monkey Jan 14 '15 at 11:39
  • 2
    ML has a lot of latent variable models. If the direction you want to approach this is, I know there is a latent state which I can measure only indirectly through 8 variables, the answer depends very much on your assumptions about the latent variable. Is it discreet? Is it continuous? Is it a process over time? And perhaps more generally, is there value in estimating the latent state, can't you build a solid multivariate model estimating directly the 8 variables? – means-to-meaning Jan 16 '15 at 19:26
  • Thanks, means-to-meaning, this is very helpful.My latent variable would be continous and there is theoretical reasoning that there is only one latent variable but it may change over time (only a few measurements, but currently I owuld only assume one time point). However, in general I do not have the need to estimate the latent state and building a "solid multi-variate model" is preferable. May I ask you to provide a reference for both models? Many thnaks in advance! – Stats_Monkey Jan 20 '15 at 09:16

1 Answers1

3

This mentions about composite variable. http://www.r-bloggers.com/ecological-sems-and-composite-variables-what-why-and-how/. In R package lavann, you can create composite variable based on manifest variables (indicators). Then this composite variable can be treated as a dependent variable or independent as per the question of interest.

Alph
  • 515
  • 1
  • 7
  • 14
  • Thanks Phil for your suggestions and the link. However, I wanted to know if there are machine learning alternatives to latent variable modelling or composite scores based on latent variable models which allow to model complex non-linear relationships. – Stats_Monkey Jan 13 '15 at 10:19