A family of algorithms combining weakly predictive models into a strongly predictive model. The most common approach is called gradient boosting, and the most commonly used weak models are classification/regression trees.
Questions tagged [boosting]
1296 questions
287
votes
8 answers
Bagging, boosting and stacking in machine learning
What's the similarities and differences between these 3 methods:
Bagging,
Boosting,
Stacking?
Which is the best one? And why?
Can you give me an example for each?

Bucsa Lucian
- 2,979
- 3
- 13
- 3
162
votes
3 answers
Gradient Boosting Tree vs Random Forest
Gradient tree boosting as proposed by Friedman uses decision trees as base learners. I'm wondering if we should make the base decision tree as complex as possible (fully grown) or simpler? Is there any explanation for the choice?
Random Forest is…

FihopZz
- 1,923
- 4
- 11
- 9
71
votes
4 answers
How to tune hyperparameters of xgboost trees?
I have a class imbalanced data & I want to tune the hyperparameters of the boosted tress using xgboost.
Questions
Is there an equivalent of gridsearchcv or randomsearchcv for
xgboost?
If not what is the recommended approach to tune the
parameters…

GeorgeOfTheRF
- 5,063
- 14
- 42
- 51
62
votes
7 answers
Why doesn't Random Forest handle missing values in predictors?
What are theoretical reasons to not handle missing values? Gradient boosting machines, regression trees handle missing values. Why doesn't Random Forest do that?

Fedorenko Kristina
- 723
- 1
- 6
- 6
59
votes
6 answers
Is random forest a boosting algorithm?
Short definition of boosting:
Can a set of weak learners create a single strong learner? A weak
learner is defined to be a classifier which is only slightly
correlated with the true classification (it can label examples better
than random…

Atilla Ozgur
- 1,251
- 1
- 11
- 17
56
votes
4 answers
What is the proper usage of scale_pos_weight in xgboost for imbalanced datasets?
I have a very imbalanced dataset. I'm trying to follow the tuning advice and use scale_pos_weight but not sure how should I tune it.
I can see that RegLossObj.GetGradient does:
if (info.labels[i] == 1.0f) w *= param_.scale_pos_weight
so a gradient…

ihadanny
- 2,596
- 3
- 19
- 31
55
votes
2 answers
Intuitive explanations of differences between Gradient Boosting Trees (GBM) & Adaboost
I'm trying to understand the differences between GBM & Adaboost.
These are what I've understood so far:
There are both boosting algorithms, which learns from previous model's errors and finally make a weighted sum of the models.
GBM and Adaboost…

Hee Kyung Yoon
- 687
- 1
- 6
- 9
47
votes
1 answer
Explanation of min_child_weight in xgboost algorithm
The definition of the min_child_weight parameter in xgboost is given as the:
minimum sum of instance weight (hessian) needed in a child. If the
tree partition step results in a leaf node with the sum of instance
weight less than…

User123456789
- 613
- 1
- 5
- 9
44
votes
3 answers
Gradient Boosting for Linear Regression - why does it not work?
While learning about Gradient Boosting, I haven't heard about any constraints regarding the properties of a "weak classifier" that the method uses to build and ensemble model. However, I could not imagine an application of a GB that uses linear…

Matek
- 749
- 1
- 6
- 14
42
votes
1 answer
Relative variable importance for Boosting
I'm looking for an explanation of how relative variable importance is computed in Gradient Boosted Trees that is not overly general/simplistic like:
The measures are based on the number of times a variable is selected for splitting, weighted by the…

Antoine
- 5,740
- 7
- 29
- 53
36
votes
1 answer
Mathematical differences between GBM, XGBoost, LightGBM, CatBoost?
There exist several implementations of the GBDT family of model such as:
GBM
XGBoost
LightGBM
Catboost.
What are the mathematical differences between these different implementations?
Catboost seems to outperform the other implementations even by…

Metariat
- 2,376
- 4
- 21
- 41
36
votes
2 answers
Is this the state of art regression methodology?
I've been following Kaggle competitions for a long time and I come to realize that many winning strategies involve using at least one of the "big threes": bagging, boosting and stacking.
For regressions, rather than focusing on building one best…

Maxareo
- 535
- 5
- 11
35
votes
3 answers
What algorithms need feature scaling, beside from SVM?
I am working with many algorithms: RandomForest, DecisionTrees, NaiveBayes, SVM (kernel=linear and rbf), KNN, LDA and XGBoost. All of them were pretty fast except for SVM. That is when I got to know that it needs feature scaling to work faster. Then…

Aizzaac
- 989
- 2
- 11
- 21
35
votes
1 answer
XGBoost Loss function Approximation With Taylor Expansion
As an example, take the objective function of the XGBoost model on the $t$'th iteration:
$$\mathcal{L}^{(t)}=\sum_{i=1}^n\ell(y_i,\hat{y}_i^{(t-1)}+f_t(\mathbf{x}_i))+\Omega(f_t)$$
where $\ell$ is the loss function, $f_t$ is the $t$'th tree output…

Alex R.
- 13,097
- 2
- 25
- 49
33
votes
1 answer
What are some useful guidelines for GBM parameters?
What are some useful guidelines for testing parameters (i.e. interaction depth, minchild, sample rate, etc.) using GBM?
Let's say I have 70-100 features, a population of 200,000 and I intend to test interaction depth of 3 and 4. Clearly I need to do…

Ram Ahluwalia
- 3,003
- 6
- 27
- 38