1

I implemented a binary classification setup of AdaBoost, but I train one model for each label in a one-vs-all arrangement, and in the prediction time I choose the class corresponding to the model with the highest return value. Using 5-fold cross-validation with two distinct random states, this is the accuracy graph:

training and validation accuracies

My question is that from which point we can speak of overfitting? The criterium should be "divergence of training and validation accuracy" (i.e., 2000) or "decrease of validation accuracy" (i.e., 8000)?

1 Answers1

1

Putting aside the compulsory reading on: "*https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models*", (e.g. multi-class AUC-ROC probably is a more sound option) this learner appears to start over-fitting at approximately 8000 iteration indeed. It would probably be better to use repeated $K$-fold cross-validation instead a single $K$-fold run as to smooth out a bit of the sampling variance but aside that this work seem to be on the clear.

usεr11852
  • 33,608
  • 2
  • 75
  • 117