Questions tagged [adaboost]

A popular boosting algorithm (short for "adaptive boosting"). Boosting combines weakly predictive models into a strongly predictive model.

AdaBoost, short for "Adaptive Boosting", is a machine learning meta-algorithm, where the output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier.

The following is pseudo-code for the AdaBoost algorithm:


  • Given $m$ labeled training data: $(x_1,y_1)...(x_m,y_m)$ where $x_i \in \mathcal{H}$ and $y_i \in \{-1,+1\}$
  • Initialize: $D_1(i) = \frac1m$ for $i = 1, ...,m$
  • For $t=1...T$:
    • Train weak learner using distribution $D_t$
    • Get weak hypothesis $h_t: \mathcal{H} \rightarrow \{-1,+1\}$
    • Aim: Select $h_t$ with low weight error: $$\epsilon_t = \text{Pr}_{i \sim D_t}[h(x_i) \ne y_i]$$
    • Choose: $\alpha_t = \frac12 ln\big(\frac{1-\epsilon_t}{\epsilon_t}\big)$
    • Update, for $i = 1, ... m$: $$D_{t+1} = \frac{D_t(i)e^{-\alpha_ty_ih_t(x_i)}}{Z_t}$$

Where $Z_t$ is a normalization factor chosen such that $D_{t+1}$ is a probability distribution.

Output the final hypothesis: $$H(x) = \text{sign}(\sum_{t=1}^{T}\alpha_th_t(x))$$


On each round, a distribution $D_t$ is computed over the $m$ training examples, and a given weak learner or weak learning algorithm is applied to find a weak hypothesis $h_t: \mathcal{H} \rightarrow \{−1,+1\}$, where the aim of the weak learner is to find a weak hypothesis with low weighted error $\epsilon_t$ relative to $D_t$. $H$ is computed as a weighted majority vote of the weak hypotheses $h_t$ where each is assigned weight $\alpha_t$.

See here for reference.

121 questions
55
votes
2 answers

Intuitive explanations of differences between Gradient Boosting Trees (GBM) & Adaboost

I'm trying to understand the differences between GBM & Adaboost. These are what I've understood so far: There are both boosting algorithms, which learns from previous model's errors and finally make a weighted sum of the models. GBM and Adaboost…
Hee Kyung Yoon
  • 687
  • 1
  • 6
  • 9
43
votes
3 answers

What is meant by 'weak learner'?

Can anyone tell me what is meant by the phrase 'weak learner'? Is it supposed to be a weak hypothesis? I am confused about the relationship between a weak learner and a weak classifier. Are both the same or is there some difference? In the adaboost…
vrushali
  • 431
  • 1
  • 4
  • 3
18
votes
2 answers

Deep learning vs. Decision trees and boosting methods

I am looking for papers or texts that compare and discuss (either empirically or theoretically): Boosting and Decision trees algorithms such as Random Forests or AdaBoost, and GentleBoost applied to decision trees. with Deep learning methods…
15
votes
2 answers

Boosting A Logistic Regression Model

Adaboost is an ensemble method that combines many weak learners to form a strong one. All of the examples of adaboost that i have read use decision stumps/trees as weak learners. Can i use different weak learners in adaboost? For example, how to…
gnikol
  • 657
  • 2
  • 6
  • 16
14
votes
1 answer

When would one want to use AdaBoost?

As I've heard of the AdaBoost classifier repeatedly mentioned at work, I wanted to get a better feel for how it works and when one might want to use it. I've gone ahead and read a number of papers and tutorials on it which I found on Google, but…
YuliaPro
  • 141
  • 1
  • 1
  • 3
8
votes
1 answer

Common weak learners for Adaboost

I'm looking for a set of weak classifiers that work with Adaboost to test on popular datasets. Most of the examples on the web use some kind of random weak learners which work on their own randomly generated dataset. Could you point me to any usable…
garak
  • 2,033
  • 4
  • 26
  • 31
7
votes
1 answer

learning rate in Adaboost sklearn

I can't figure out what does learning_rate stand for in sklearn implementation of Adaboost. When i see the original algorithm i don't see any "learning_rate"... Meanwhile i can see from https://fr.wikipedia.org/wiki/AdaBoost that the training errors…
curious
  • 343
  • 1
  • 3
  • 10
7
votes
3 answers

Why is boosting less likely to overfit?

I've been learning about machine learning boosting methods (e.g., ADA boost, gradient boost) and the information sources mentioned that boosting tree methods are less likely to overfit than other machine learning methods. Why would that be the…
Bill Anderson
  • 95
  • 1
  • 1
  • 3
6
votes
0 answers

Weighted Conditional Expectation definition in AdaBoost

I am looking at "Additive logistic regression a statistical view of boosting" paper (https://web.stanford.edu/~hastie/Papers/AdditiveLogisticRegression/alr.pdf) In page 346, the authors introduce a definition for a weighted expectation. I do not…
Aggamarcel
  • 61
  • 2
6
votes
1 answer

Can AdaBoost be used for regression?

I know that AdaBoost can be used for classification, but how about regression? With classification, it is clear how to assign the "amount of say" (or weight) to the predictions of each model (stump) in the final ensemble of models. Each of the…
MegaNightdude
  • 83
  • 1
  • 6
6
votes
1 answer

Tuning adaboost

The boosting algorithm Adaboost (when using a tree) has three core parameters: number of weak learners to train learning rate max nb of splits (depth of tree) What are good practices, perhaps proven empirically, for finding the appropriate value…
JohnAndrews
  • 537
  • 1
  • 5
  • 15
6
votes
3 answers

How to tune the weak learner in boosted algorithms

It is commonly said that boosted algorithms (adaboost, gradient boosted trees) are composed of many "weak" learners. Let's stick to decision trees as the base learners. Some empirical studies recommended using trees with something like 5-10 terminal…
nikosd
  • 409
  • 3
  • 8
6
votes
1 answer

Adaboost - update of weights

i am self-studying AdaBoost - and reading the following useful article. http://www.inf.fu-berlin.de/inst/ag-ki/adaboost4.pdf . I am trying to understand, as per below, the following questions: 1) When we select and extract from the pool of…
Wouter
  • 2,102
  • 3
  • 17
  • 26
5
votes
2 answers

Using Adaboost for feature selection?

Is it okay to use Adaboost to do feature selection (selecting a subset of dimensions $S$ from a high-dimensional feature vector $V$)? I divided the samples into four non-overlapping sets: $A$ (training1), $B$ (validation), $C$ (training2), $D$…
DataHungry
  • 151
  • 1
  • 3
5
votes
2 answers

How to ensure that increasing the weights of misclassified points in AdaBoost does not adversely affect the learning progress?

It seems that we increase the weights of misclassified points on every iteration of AdaBoost. Therefore, the subsequent classifiers focus on the misclassified samples more. This would imply that these classifiers are somewhat specialized for that…
Baron Yugovich
  • 515
  • 1
  • 6
  • 18
1
2 3
8 9