15

For boosting algorithms, I would say that they evolved pretty well. In early 1995 AdaBoost was introduced, then after some time it was Gradient Boosting Machine (GBM). Recently, around 2015 XGBoost was introduced, which is accurate, handles overfitting and has become a winner of multiple Kaggle competitions. In 2017 LightGBM was introduced by Microsoft, it offers a significantly lower training time comparing to XGBoost. Also, CatBoost was introduced by Yandex for handling categorical features.

Random Forest was introduced in early 2000s, but has there been any worthy successors to it? I think if a better bagging algorithm than Random Forest existed (which can be easily applied in practice) it would have gained some attention at places like Kaggle. Also, why did boosting became the more popular ensemble technique, is it because you can build less trees for an optimal prediction?

Marius
  • 381
  • 2
  • 6
  • 1
    adaBoost was actually introduced in 1995, but that's a minor point that doesn't alter your fundamental thesis. – jbowman May 09 '18 at 21:52
  • 3
    Since random forests we've also seen the introduction of [extremely randomized trees](http://scikit-learn.org/stable/modules/ensemble.html#extremely-randomized-trees), although I'm not really aware of any good evidence that these outperform random forests with any consistency, so they may not be a "worthy" successor... – Jake Westfall May 10 '18 at 16:27
  • 1
    BART (https://arxiv.org/abs/0806.3286) is a Bayesian model which evolved from the single tree Bayesian CART and is inspired by the classical ensemble methods. It's worth exploring. – Zen May 10 '18 at 16:43
  • boosting became more popular since it handles many problems successfully with weak learner techniques – Refael May 22 '18 at 16:47
  • Regularized greedy forests could be worth mentioning (slow but some good results) and quantile random forests for their cool side effects. – Michael M May 22 '18 at 17:32
  • Oblique forests were used by Laptev in 2014 to do what deep learning does, only 3 about 3 orders of magnitude faster. I like the bivariate extension of the univariate split for the classic CART used in Oblique, it feels like efficient multivariate. – EngrStudent Feb 10 '20 at 19:53

1 Answers1

3

xgboost, catboost and lightgbm use some features of random forest (random sampling of variables/observations), so I think they are a successor of boosting and RF together and take the best things from both. ;)

PhilippPro
  • 1,009
  • 6
  • 10