0

I am working on a project using Boosting Tree and Random Forest. One question was raised up on the risk of interpolation in the testing data. Namely, these models will perform interpolation on the new data samples. In case the training data space is not dense enough in certain area, how does the model perform the interpolation in the feature space? Thanks!

Sycorax
  • 76,417
  • 20
  • 189
  • 313
Rob
  • 1
  • 1
    Boosted trees and random forests do not interpolate on new or on training data. You might care to read up on how tree-based models work, as that may clear up your confusion. – jbowman Jun 04 '18 at 14:31
  • I can see where the confusion comes from, though, on thinking more about it! – jbowman Jun 05 '18 at 03:35
  • A regression tree is going to typically have a constant model for each leaf. (Some folks talk about linear regression models on each leaf.) It is constant interpolating function there. A classification tree is going to have a single class assigned to each leaf, so interpolation doesn't make sense there. – EngrStudent Jun 05 '18 at 13:34
  • Thanks for clarification! Make sense if each leaf share the same label and the entire space is partitioned completely by the tree splitting procedure. This also indicates that tree moddel is bad for extrapolation. Expand this to Neural network. Assume it has interpolation problem if the training data is not dense? – Rob Jun 11 '18 at 18:28

1 Answers1

1

The comments produce a good exchange, which goes most of the way to an answer.

The core idea is that standard tree-based models like random forest don't do any interpolation. The predictions of a random forest regression are piecewise linear because each leaf predicts a constant value. The predictions of a random forest classifier are "votes" for one class or another by each tree, for each sample.

If you average the effect of all of these trees, you end up with some "gradations" of performance in the feature space. For example, a plot of random forest predictions to identify points inside the unit circle (including some noise features) looks like this: enter image description here

We can see that the model is a little bit less confident near the edges of the square, and the effect of having 2 signal features and several noise features is that the axis-aligned "boxes" lead to erroneous predictions. These are all related to the averaged effects of binary predictions on noisy data.

Sycorax
  • 76,417
  • 20
  • 189
  • 313