3

I've done some driver importance analyses with the relaimpo package in R. However, the "normal" Shapley value regressions/driver analyses/Kruskal analyses (whatever you want to name them) require a metric dependent variable, because it's an approach for linear regressions.

I have a new dataset, where I have a dependent variable with two values (0/1) and want to assess the relative importance of 10 metric independent variables.

Is anyone aware of an approach to do such a driver analysis with a binary dependent variable or knows a different approach to assess the relative importances?

Thanks.

deschen
  • 479
  • 3
  • 12

2 Answers2

1

you can do logistic regression/ or random forest classification, and analyze the important variables. in R you have importance() function that gives you the relative importance of the variables in .

yosemite_k
  • 115
  • 3
  • The concept of importance in Shapley regression is very different to that in a Random Forest (a Random Forest will find fewer variables as being more important, all else being equal). And, the ``importance`` function you refer to is not shipped in ``base`` R. – Tim Mar 01 '17 at 03:13
  • can you explain more, or add some supporting reference? – yosemite_k Mar 02 '17 at 13:53
  • Sure. The references are in the answer below. Feel free to up-vote after you have read the reference. – Tim Mar 06 '17 at 03:56
1

Relative Importance Analysis gives essentially the same results as Shapley (but not ask Kruskal). A variant of Relative Importance Analysis has been developed for binary dependent variables.

However, binary variables are arguable numeric, and I'd be shocked if you got a meaningfully different result from using a standard Shapley regression with your data.

Tim
  • 3,255
  • 14
  • 24