Questions tagged [boosting]

A family of algorithms combining weakly predictive models into a strongly predictive model. The most common approach is called gradient boosting, and the most commonly used weak models are classification/regression trees.

1296 questions
287
votes
8 answers

Bagging, boosting and stacking in machine learning

What's the similarities and differences between these 3 methods: Bagging, Boosting, Stacking? Which is the best one? And why? Can you give me an example for each?
162
votes
3 answers

Gradient Boosting Tree vs Random Forest

Gradient tree boosting as proposed by Friedman uses decision trees as base learners. I'm wondering if we should make the base decision tree as complex as possible (fully grown) or simpler? Is there any explanation for the choice? Random Forest is…
FihopZz
  • 1,923
  • 4
  • 11
  • 9
71
votes
4 answers

How to tune hyperparameters of xgboost trees?

I have a class imbalanced data & I want to tune the hyperparameters of the boosted tress using xgboost. Questions Is there an equivalent of gridsearchcv or randomsearchcv for xgboost? If not what is the recommended approach to tune the parameters…
GeorgeOfTheRF
  • 5,063
  • 14
  • 42
  • 51
62
votes
7 answers

Why doesn't Random Forest handle missing values in predictors?

What are theoretical reasons to not handle missing values? Gradient boosting machines, regression trees handle missing values. Why doesn't Random Forest do that?
Fedorenko Kristina
  • 723
  • 1
  • 6
  • 6
59
votes
6 answers

Is random forest a boosting algorithm?

Short definition of boosting: Can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification (it can label examples better than random…
Atilla Ozgur
  • 1,251
  • 1
  • 11
  • 17
56
votes
4 answers

What is the proper usage of scale_pos_weight in xgboost for imbalanced datasets?

I have a very imbalanced dataset. I'm trying to follow the tuning advice and use scale_pos_weight but not sure how should I tune it. I can see that RegLossObj.GetGradient does: if (info.labels[i] == 1.0f) w *= param_.scale_pos_weight so a gradient…
ihadanny
  • 2,596
  • 3
  • 19
  • 31
55
votes
2 answers

Intuitive explanations of differences between Gradient Boosting Trees (GBM) & Adaboost

I'm trying to understand the differences between GBM & Adaboost. These are what I've understood so far: There are both boosting algorithms, which learns from previous model's errors and finally make a weighted sum of the models. GBM and Adaboost…
Hee Kyung Yoon
  • 687
  • 1
  • 6
  • 9
47
votes
1 answer

Explanation of min_child_weight in xgboost algorithm

The definition of the min_child_weight parameter in xgboost is given as the: minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than…
User123456789
  • 613
  • 1
  • 5
  • 9
44
votes
3 answers

Gradient Boosting for Linear Regression - why does it not work?

While learning about Gradient Boosting, I haven't heard about any constraints regarding the properties of a "weak classifier" that the method uses to build and ensemble model. However, I could not imagine an application of a GB that uses linear…
42
votes
1 answer

Relative variable importance for Boosting

I'm looking for an explanation of how relative variable importance is computed in Gradient Boosted Trees that is not overly general/simplistic like: The measures are based on the number of times a variable is selected for splitting, weighted by the…
Antoine
  • 5,740
  • 7
  • 29
  • 53
36
votes
1 answer

Mathematical differences between GBM, XGBoost, LightGBM, CatBoost?

There exist several implementations of the GBDT family of model such as: GBM XGBoost LightGBM Catboost. What are the mathematical differences between these different implementations? Catboost seems to outperform the other implementations even by…
Metariat
  • 2,376
  • 4
  • 21
  • 41
36
votes
2 answers

Is this the state of art regression methodology?

I've been following Kaggle competitions for a long time and I come to realize that many winning strategies involve using at least one of the "big threes": bagging, boosting and stacking. For regressions, rather than focusing on building one best…
35
votes
3 answers

What algorithms need feature scaling, beside from SVM?

I am working with many algorithms: RandomForest, DecisionTrees, NaiveBayes, SVM (kernel=linear and rbf), KNN, LDA and XGBoost. All of them were pretty fast except for SVM. That is when I got to know that it needs feature scaling to work faster. Then…
Aizzaac
  • 989
  • 2
  • 11
  • 21
35
votes
1 answer

XGBoost Loss function Approximation With Taylor Expansion

As an example, take the objective function of the XGBoost model on the $t$'th iteration: $$\mathcal{L}^{(t)}=\sum_{i=1}^n\ell(y_i,\hat{y}_i^{(t-1)}+f_t(\mathbf{x}_i))+\Omega(f_t)$$ where $\ell$ is the loss function, $f_t$ is the $t$'th tree output…
Alex R.
  • 13,097
  • 2
  • 25
  • 49
33
votes
1 answer

What are some useful guidelines for GBM parameters?

What are some useful guidelines for testing parameters (i.e. interaction depth, minchild, sample rate, etc.) using GBM? Let's say I have 70-100 features, a population of 200,000 and I intend to test interaction depth of 3 and 4. Clearly I need to do…
Ram Ahluwalia
  • 3,003
  • 6
  • 27
  • 38
1
2 3
86 87