A popular boosting algorithm (short for "adaptive boosting"). Boosting combines weakly predictive models into a strongly predictive model.
AdaBoost, short for "Adaptive Boosting", is a machine learning meta-algorithm, where the output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier.
The following is pseudo-code for the AdaBoost algorithm:
- Given $m$ labeled training data: $(x_1,y_1)...(x_m,y_m)$ where $x_i \in \mathcal{H}$ and $y_i \in \{-1,+1\}$
- Initialize: $D_1(i) = \frac1m$ for $i = 1, ...,m$
- For $t=1...T$:
- Train weak learner using distribution $D_t$
- Get weak hypothesis $h_t: \mathcal{H} \rightarrow \{-1,+1\}$
- Aim: Select $h_t$ with low weight error: $$\epsilon_t = \text{Pr}_{i \sim D_t}[h(x_i) \ne y_i]$$
- Choose: $\alpha_t = \frac12 ln\big(\frac{1-\epsilon_t}{\epsilon_t}\big)$
- Update, for $i = 1, ... m$: $$D_{t+1} = \frac{D_t(i)e^{-\alpha_ty_ih_t(x_i)}}{Z_t}$$
Where $Z_t$ is a normalization factor chosen such that $D_{t+1}$ is a probability distribution.
Output the final hypothesis: $$H(x) = \text{sign}(\sum_{t=1}^{T}\alpha_th_t(x))$$
On each round, a distribution $D_t$ is computed over the $m$ training examples, and a given weak learner or weak learning algorithm is applied to find a weak hypothesis $h_t: \mathcal{H} \rightarrow \{−1,+1\}$, where the aim of the weak learner is to find a weak hypothesis with low weighted error $\epsilon_t$ relative to $D_t$. $H$ is computed as a weighted majority vote of the weak hypotheses $h_t$ where each is assigned weight $\alpha_t$.
See here for reference.