1

I have a dataset including discrete and continuous variables on which I ran a random forest model using the r-package randomForest. I have read several times that the output of a RF model is a "black box".

I know that I can actually get the splitting points of the numerical values by tree. However, I would like to know, if I could extract a "representative" splitting point which I could use, in general (not specific to only 1tree).

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Eva Jo
  • 11
  • 1

1 Answers1

1

Not really. All splits after the first are conditional on the previous splits, so even if you only look at splits on some variable, it doesn't have the same meaning -- it's the difference between $X|Y$ and $X|Z$ etc. Imagine that you're classifying points in a circle like in this example. If you're examining $x_1$, sometimes you split near 1 and sometimes near -1, but it's going to be hard to make sense of that out of context. @gung's suggestion to look at this thread is a good one.

Sycorax
  • 76,417
  • 20
  • 189
  • 313