Divergence of accuracies and overfitting in AdaBoost

Question

I implemented a binary classification setup of AdaBoost, but I train one model for each label in a one-vs-all arrangement, and in the prediction time I choose the class corresponding to the model with the highest return value. Using 5-fold cross-validation with two distinct random states, this is the accuracy graph:

My question is that from which point we can speak of overfitting? The criterium should be "divergence of training and validation accuracy" (i.e., 2000) or "decrease of validation accuracy" (i.e., 8000)?

There are still going strong, 8000 indeed your inflection point. — usεr11852, Dec 20 '21 at 10:46

score 1 · Answer 1 · answered Dec 20 '21 at 13:31

Putting aside the compulsory reading on: "*https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models*", (e.g. multi-class AUC-ROC probably is a more sound option) this learner appears to start over-fitting at approximately 8000 iteration indeed. It would probably be better to use repeated $K$-fold cross-validation instead a single $K$-fold run as to smooth out a bit of the sampling variance but aside that this work seem to be on the clear.

Divergence of accuracies and overfitting in AdaBoost

1 Answers1