Questions tagged [bagging]

Bagging or bootstrap aggregation is a special case of model averaging. Given a standard training set bagging generates $m$ new training sets by bootstrapping, and then the results of using some training method on the $m$ generated data sets are averaged. Bagging can stabilize results from some unstable methods such as trees.

See https://en.wikipedia.org/wiki/Bootstrap_aggregating for more information and references.

173 questions
287
votes
8 answers

Bagging, boosting and stacking in machine learning

What's the similarities and differences between these 3 methods: Bagging, Boosting, Stacking? Which is the best one? And why? Can you give me an example for each?
59
votes
6 answers

Is random forest a boosting algorithm?

Short definition of boosting: Can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification (it can label examples better than random…
Atilla Ozgur
  • 1,251
  • 1
  • 11
  • 17
36
votes
2 answers

Is this the state of art regression methodology?

I've been following Kaggle competitions for a long time and I come to realize that many winning strategies involve using at least one of the "big threes": bagging, boosting and stacking. For regressions, rather than focusing on building one best…
24
votes
2 answers

What does "node size" refer to in the Random Forest?

I do not understand exactly what is meant by node size. I know what a decision node is, but not what node size is.
wolfsatthedoor
  • 771
  • 1
  • 7
  • 21
22
votes
1 answer

Boosting AND Bagging Trees (XGBoost, LightGBM)

There are many blog posts, YouTube videos, etc. about the ideas of bagging or boosting trees. My general understanding is that the pseudo code for each is: Bagging: Take N random samples of x% of the samples and y% of the features Fit your model…
Jonathan
  • 393
  • 2
  • 7
21
votes
3 answers

When should I not use an ensemble classifier?

In general, in a classification problem where the goal is to accurately predict out-of-sample class membership, when should I not to use an ensemble classifier? This question is closely related to Why not always use ensemble learning?. That question…
shadowtalker
  • 11,395
  • 3
  • 49
  • 109
19
votes
5 answers

Are Random Forest and Boosting parametric or non-parametric?

By reading the excellent Statistical modeling: The two cultures (Breiman 2001), we can seize all the difference between traditional statistical models (e.g., linear regression) and machine learning algorithms (e.g., Bagging, Random Forest, Boosted…
Antoine
  • 5,740
  • 7
  • 29
  • 53
19
votes
1 answer

What are the theoretical guarantees of bagging

I've (approximately) heard that: bagging is a technique to reduce the variance of an predictor/estimator/learning algorithm. However, I have never seen a formal mathematical proof of this statement. Does anyone know why this is mathematically…
Charlie Parker
  • 5,836
  • 11
  • 57
  • 113
18
votes
3 answers

Why does a bagged tree / random forest tree have higher bias than a single decision tree?

If we consider a full grown decision tree (i.e. an unpruned decision tree) it has high variance and low bias. Bagging and Random Forests use these high variance models and aggregate them in order to reduce variance and thus enhance prediction…
C. Refsgaard
  • 431
  • 1
  • 4
  • 12
17
votes
2 answers

Why does the scikit-learn bootstrap function resample the test set?

When using bootstrapping for model evaluation, I always thought the out-of-bag samples were directly used as a test set. However, this appears not to be the case for the deprecated scikit-learn Bootstrap approach, which seems to build the test set…
16
votes
2 answers

Why not always use ensemble learning?

It seems to me that ensemble learning WILL always give better predictive performance than with just a single learning hypothesis. So, why don't we use them all the time? My guess is because of perhaps, computational limitations? (even then, we use…
user46925
15
votes
1 answer

What bagging algorithms are worthy successors to Random Forest?

For boosting algorithms, I would say that they evolved pretty well. In early 1995 AdaBoost was introduced, then after some time it was Gradient Boosting Machine (GBM). Recently, around 2015 XGBoost was introduced, which is accurate, handles…
Marius
  • 381
  • 2
  • 6
14
votes
7 answers

Random Forest and Decision Tree Algorithm

A random forest is a collection of decision trees following the bagging concept. When we move from one decision tree to the next decision tree then how does the information learned by last decision tree move forward to the next? Because, as per my…
Abhay Raj Singh
  • 141
  • 1
  • 3
11
votes
2 answers

Is cross validation unnecessary for Random Forest?

Is it fair to say Cross Validation (k-fold or otherwise) is unnecessary for Random Forest? I've read that is the case because we can look at out-of-bag performance metrics, and these are doing the same thing. Please help me understand this in the…
steve d
  • 177
  • 1
  • 1
  • 7
10
votes
1 answer

Should pruning be avoided for bagging (with decision trees)?

I came by several posts and papers claiming that pruning trees in a "bagging" ensemble of trees is not needed (see 1). However, is it necessarily (or at least in some known cases) damaging to perform pruning (say, with the OOB sample) on the…
Tal Galili
  • 19,935
  • 32
  • 133
  • 195
1
2 3
11 12