10

I came by several posts and papers claiming that pruning trees in a "bagging" ensemble of trees is not needed (see 1).

However, is it necessarily (or at least in some known cases) damaging to perform pruning (say, with the OOB sample) on the individual trees in an ensemble?

Thanks!

Tal Galili
  • 19,935
  • 32
  • 133
  • 195

1 Answers1

7

Tal,

Generally speaking, pruning will hurt performance of bagged trees.

Tress are unstable classifiers; meaning that if you perturb the data a little the tree might significantly change. They are low bias but high variance models. Bagging generally works by "replicating" the model to drive the variance down (the old "increase your sample size" trick).

However, if you end up averaging models that are very similar, then you don't gain much. If the trees are unpruned, they tend to be more different from one another than if they were pruned. This has the effect of "decorrelating" the trees so that you are averaging trees that are not overly similar. This is also the reason that random forests add the additional tweak of the random predictor selection. That coerces the trees into being very different.

Using unpruned trees will increase the risk of overfiting, but model averaging more than offsets this (generally speaking).

HTH,

Max

topepo
  • 5,820
  • 1
  • 19
  • 24