5

It seems that we increase the weights of misclassified points on every iteration of AdaBoost. Therefore, the subsequent classifiers focus on the misclassified samples more. This would imply that these classifiers are somewhat specialized for that region that was misclassified before.

However, the weights of the classifiers are not functions of the region they apply to. In other words, how are subsequent classifiers that focus on misclassified points not introducing problems on points that were previously classified correctly? Note that these classifiers do apply to them as well.

How do we make sure we are not moving in circles by breaking earlier correct decisions as we fix wrong decisions? How do we ensure we keep making progress?

Borhan Kazimipour
  • 240
  • 1
  • 3
  • 9
Baron Yugovich
  • 515
  • 1
  • 6
  • 18
  • The weights given to each classifier are carefully chosen to not mess up the decisions made from the classifiers in previous iterations. – Aaron Mar 30 '18 at 14:49
  • 1
    Can you please elaborate? – Baron Yugovich Mar 31 '18 at 00:04
  • @BaronYugovich Aaron is basically referring to the learning rate. In theory it can be shown that a boosted ensemble of weak learners will converge to a strong learner (more iterations -> better classifier) However, if adaboost encounters too many outliers, more iterations actually lead to worse predictions. – Laksan Nathan Apr 04 '18 at 09:52
  • I had asked the exact same question some time back https://stats.stackexchange.com/questions/333248/adaboost-algorithm-question – sww May 04 '18 at 01:14

2 Answers2

3

The new classifier in each round might indeed classify the old points incorrectly. However the previous 'versions' of the classifier(from previous iterations) are not thrown away. The end result is an ensemble/average of all the classifiers of each step where the contribution of each classifier is weighted by how well that particular classifier did at that round.

For example if we have an outlier that is hard to classify correctly, the outlier will accumulate a lot of weight. The classifier will be forced to give priority to that point and classify it correctly. This might mean that all the other points are misclassified. However, this classifier's 'opinion' will not be so important in the end because only one point was classified correctly(the outlier). This is also a good way to detect outliers. Just find the points with very large weight.

I should add that you usually do not want to let AdaBoost converge because it will most probably overfit. You need to use a method like cross-validation to find the optimal number of rounds instead.

For a more formal treatment of why AdaBoost works, I would recommend you read Bishop's chapter 14.3.1 or this paper[1] which is the first to provide a theoretical analysis of AdaBoost. It basically minimises the exponential error function.

[1] Friedman, Jerome; Hastie, Trevor; Tibshirani, Robert. Additive logistic regression: a statistical view of boosting doi:10.1214/aos/1016218223

Andreas G.
  • 1,167
  • 1
  • 9
  • 19
0

Please note that not only the training samples but also the classifiers are weighted in AdaBoost. Therefore, if a new classifiers are only good at some over-weighted samples but terrible in the classifying the easy samples (those that already classified correctly with the previously trained classifiers) will not receive a significant weight. As a result, bad classifiers (hopefully!) cannot adversely affect the good performance of the others (unless in the over-fitting scenarios which is a different case).

The following is borrowed from this tutorial:

After each classifier is trained, the classifier’s weight is calculated based on its accuracy. More accurate classifiers are given more weight. A classifier with 50% accuracy is given a weight of zero, and a classifier with less than 50% accuracy (kind of a funny concept) is given negative weight.

Borhan Kazimipour
  • 240
  • 1
  • 3
  • 9