When we build a classifier, like SVM or Naive Bayesian, are there any generic rules or theoretical derivations on the size of training data set? For example, to train a SVM-based classifier, what should be the minimum size of training data in terms of feature space and some target performance metrics, such as precision and recall?
Asked
Active
Viewed 158 times
1 Answers
0
Just a rule of thumb: for the training dataset size, take 90% of your original dataset (the one for which you have labels / you know the real class of each sample).
The remaining 10% will be your test set.
I recommend AUC as the metric to compute on your test set. Look at the ROC curve also, since AUC is just a number.
It is better to randomize the order of your samples prior to cutting them in 90%, 10%.

daruma
- 127
- 6