1

I would like to find out which variables (from a set of 30 binary variables) have the most impact on an ordinal satisfaction measurement (it can reach from 1 - not happy at all to 4 - absolutely happy). Unfortunately most of the binary independent variables are (highly) correlated.

There are about 20 different shops to sell the product and I also want to check if different customer-types have different drivers.

My dataset looks like this (with D1 to D30 being the dichotomous independent Variables):

Data structur

I wanted to use a hierarchical regression, but I think it will not be appropriate for the ordinal dependent variable. Another problem might be the high correlation between the binary independent variables.

So now I read about random forest classification, but I am not sure if this is the right way to go?

Do you have any suggestions about a proper method for my problem? And more generally, are there any methods to deal with high correlation in binary predictors?

T.E.G.
  • 1,676
  • 6
  • 16
  • 29
TinglTanglBob
  • 878
  • 1
  • 8
  • 13
  • This question might be helpful: http://stats.stackexchange.com/questions/16331/doing-principal-component-analysis-or-factor-analysis-on-binary-data-using-spss – T.E.G. Feb 12 '17 at 04:15
  • I've tried extracting factors, using Tetrachoric correlation, allready, but the problem is, that i want to know which single binary variables have the most influence. So factorising the data doesn't deliver the desired results. At the moment i am using varimp from the party package (but i am not really sure if it can be used whith binary variables only). So any further ideas would be very wellcome! – TinglTanglBob Feb 20 '17 at 09:45

0 Answers0