5

When we build a classifier, like SVM or Naive Bayesian, are there any generic rules or theoretical derivations on the size of training data set? For example, to train a SVM-based classifier, what should be the minimum size of training data in terms of feature space and some target performance metrics, such as precision and recall?

user3125
  • 2,617
  • 4
  • 25
  • 33

1 Answers1

0

Just a rule of thumb: for the training dataset size, take 90% of your original dataset (the one for which you have labels / you know the real class of each sample).

The remaining 10% will be your test set.

I recommend AUC as the metric to compute on your test set. Look at the ROC curve also, since AUC is just a number.

It is better to randomize the order of your samples prior to cutting them in 90%, 10%.

daruma
  • 127
  • 6