1

I compared the output from the two approaches using rfsrc() and gbm() in R respectively. The important factors given in the output from the two approaches are totally different.

Since the importance of factors in gradient boost depends on the order of the trees, are they somewhat arbitrary? How do we interpret the important influencers from gbm()?

hehe
  • 347
  • 2
  • 9
  • Even though random forest uses decision trees, and gradient boosting typically uses decision trees, the models themselves are estimated using completely different procedures. Why would you expect two different procedures to produce the same result? – Sycorax Apr 07 '21 at 15:06
  • I would think the importance of a predictor should be relatively robust to the model, especially if it is a strong predictor. The important influencers provided by gbm() are quite unexpected. – hehe Apr 07 '21 at 15:26
  • But the measurement of the strength/weakness of a predictor depends on the model. Would you expect a linear regression and a decision tree to find the same strong predictors? Why or why not? – Sycorax Apr 07 '21 at 15:37
  • To see how the principles noted by @Sycorax apply in your case, please provide a lot more details about the modeling processes: how large a data set, how many predictors, collinearity among predictors, the depths of interactions allowed, other critical parameter settings for training the models, and whether the important predictors returned by one model might have been correlated with those returned by the other model. – EdM Apr 07 '21 at 16:32
  • Thanks. Since the question is closed due to repetition, I will not add onto that. It might be due to the correlation among predictors. Based on answer to the other similar question, sounds like the important factors returned by gradient boosting is quite arbitrary depending on which feature is selected first in the modeling process. Is the ranking of the important factors really meaningful in this case? or should we just focus on the prediction? – hehe Apr 08 '21 at 18:01
  • 1
    @hehe I’ve posted some of my thoughts [here](https://stats.stackexchange.com/a/202853/28500) a few years ago. – EdM Apr 09 '21 at 03:08
  • @EdM thanks again. this looks complicated. I will think about it. – hehe Apr 09 '21 at 03:29

1 Answers1

1

Whether or not the importances of gradient boosting are strongly influenced by randomness depends on the hyper-parameter configuration of the gradient boosting model.

  • A gradient boosted model which uses random subsampling of features (or other randomized components) will estimate feature importances which vary to a greater or lesser degree upon repeated model estimation procedures.

  • A gradient boosted model which does not have any randomized components is deterministic, because this model selects the best features at each split. (The handling of ties might be a source of non-deterministic behavior if a seed is not set.) In the latter case, the deterministic nature of the model means that the importances cannot be arbitrary ("based on random choice or personal whim, rather than any reason or system").

Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • thanks. this makes sense. Do you know by default which way does gbm() use? and which option sets this hyperparameter in this function? – hehe Apr 09 '21 at 03:23