Regarding the size of training data for building classifier

Question

When we build a classifier, like SVM or Naive Bayesian, are there any generic rules or theoretical derivations on the size of training data set? For example, to train a SVM-based classifier, what should be the minimum size of training data in terms of feature space and some target performance metrics, such as precision and recall?

score 0 · Answer 1 · answered Dec 04 '17 at 08:30

Just a rule of thumb: for the training dataset size, take 90% of your original dataset (the one for which you have labels / you know the real class of each sample).

The remaining 10% will be your test set.

I recommend AUC as the metric to compute on your test set. Look at the ROC curve also, since AUC is just a number.

It is better to randomize the order of your samples prior to cutting them in 90%, 10%.

Regarding the size of training data for building classifier

1 Answers1