Cross-Validation: how to choose k for small datasets (n=900)?

Asked Nov 27 '14 at 13:23

Active Nov 27 '14 at 14:07

Viewed 218 times

In a binary classification task, I have a small training set (n=900, 9 features). The two groups are not symmetric (1 = 560, 0 = 340). I also have a test set (n=400) where I don't know the class variable.

Let's say I want to check if a SVM works fine. To estimate the best hypothesis I'll do cross-validation. How do I choose k?

If a choose k=10, I have a training set of 810 cases and a cv set of 90 cases. Bias is low but variance is high.

Does anybody know a rule of thumb in this kind of situation?

edited Nov 27 '14 at 14:07

asked Nov 27 '14 at 13:23

Buzz Lightyear

For better answers I suggest you say what classification procedure you're conducting. If it's predicting a binary outcome, how high is the incidence rate of that outcome? E.g., 90 cases would be insufficient if that rate is 10%. – rolando2 Nov 27 '14 at 13:51
Does this general question on choosing $k$: http://stats.stackexchange.com/questions/27730/choice-of-k-in-k-fold-cross-validation help? – cbeleites unhappy with SX Nov 27 '14 at 14:35

Cross-Validation: how to choose k for small datasets (n=900)?

0 Answers0