1

I've been asked to run a model using gradient boosting or random forest. So far so good, however, the only output that comes back in terms of variable importance is based on the number of times a variable was used as a branch rule. I've now been asked to basically get coefficients or somehow quantify the impact that the variables have on the target. Is there a way to do this with a gradient boosting model? My other thoughts were to either use only the variables that were showed to be sued as branch rules in a regular decision tree or in a GLM or regular regression model.

Any help or ides would be appreciated!! Thanks so much!

jswtraveler
  • 111
  • 1

1 Answers1

1

Permutation based variable importance (VI) can be implemented for any bagged ensemble. However, the neat thing about an RF model is you get the out-of-bag error estimate and VI for almost no extra computation cost, as it is already bagged.

You can compute VI for any ensemble, where you care to bootstrap the model 50 times and aggregate, that includes gradient boosting models. If needed, I could write you a short VImyModel wrapper for R, [here].

You should look into what is the structure of your trained models. The model surfaces of non-linear non-additive models cannot be described with coefficients like for their GLM hyper plane counter parts. These answers should update you on some generel black box method (partial dependence) and some tree specific methods.

If the simple linear model or single tree by some measure seem just as useful as the RF or GBT, why use the advanced models in the first place? If the simplified model perform much worse, why settle with a good description of what model NOT to pick. Therefore, I prefer never to use a stand in dummy model to explain the more advanced model. Spend your time learning the techniques for understanding the advanced models.

To inspect your model structure is important. Sometimes you will realize your advanced model is exploiting some data leak to make accurate predictions. However, such model may not be useful in practice. Maybe the model is using patient IDs to predict cancer. However, if patient IDs are assigned after diagnosis according to where the patient is hospitalized, a model predicting patients in palliative hospices most likely to have cancer is not useful in any way. Sometimes you learn the model utilize some relationship you had not thought of yourself, and it seems there are some plausible causal explanations for this link. Hurray! Go investigate further.