Breiman says that the trees are grown with out pruning. Why? I mean to say that there must be a solid reason why the trees in random forest are not pruned. On the other hand it is considered very important to prune a single decision tree to avoid over fitting. Is there some literature available to read for this reason? Of course the trees may not be correlated but there still will be a possibility of over fitting.
-
You really need to say more about the context here. @ChrisA. has made a notable attempt, but it's hard to know if your question is really answered, because it's difficult to know much about your quandry. – gung - Reinstate Monica Sep 14 '12 at 20:40
-
3What more needs to be said? The question is very clear. – Seanosapien Oct 22 '17 at 21:38
2 Answers
Roughly speaking, some of the potential over-fitting that might happen in a single tree (which is a reason you do pruning generally) is mitigated by two things in a Random Forest:
- The fact that the samples used to train the individual trees are "bootstrapped".
- The fact that you have a multitude of random trees using random features and thus the individual trees are strong but not so correlated with each other.
Edit: based on OP's comment below:
There's definitely still potential for over-fitting. As far as articles, you can read about the motivation for "bagging" by Breiman and "bootstrapping" in general by Efron and Tibshirani. As far as 2., Breiman derived a loose bound on generalization error that is related to tree strength and anti-correlation of the individual classifiers. Nobody uses the bound (most likely) but it's meant to give intuition about what helps low generalization error in ensemble methods. This is in the Random Forests paper itself. My post was to push you in the right direction based on these readings and my experience/deductions.
- Breiman, L., Bagging Predictors, Machine Learning, 24(2), pp.123-140, 1996.
- Efron, B.; Tibshirani, R. (1993). An Introduction to the Bootstrap. Boca Raton, FL
- Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
-
But there may still be a possibility of overfitting. Can you cite an article to read for this? – Z Khan Sep 15 '12 at 19:12
-
1@Z Khan Are you perhaps also [this Z Khan](http://stats.stackexchange.com/users/13806/z-khan)? If so, please let us know so we can merge your accounts. – whuber Sep 19 '12 at 17:41
-
4@ZKhan The overfitting issue in RFs is covered in [Hastie et al, (2009) Elements of Statistical Learning, 2nd Edition](http://statweb.stanford.edu/~tibs/ElemStatLearn/). There is a free PDF available at the website for the book. Check out the chapter on random forests. – Gavin Simpson Dec 02 '13 at 19:23
A decision tree that is very deep or of full depth tends to learn the noise in the data. They overfit the data leading to low bias but high variance. Pruning is a suitable approach used in decision trees to reduce overfitting.
However, generally random forests would give good performance with full depth. As random forests training use bootstrap aggregation (or sampling with replacement) along with a random selection of features for a split, the correlation between the trees (or weak learners) would be low. That means although individual trees would have high variance, the ensemble output will be appropriate (lower variance and lower bias) because the trees are not correlated.
If you still want to control the training in a random forest, go for controlling the tree depth instead of pruning.