I am working on a binary classification problem with relatively few instances (e.g. ~30 instances out of which ~7 are positives).
I have noticed that when using 2-fold the average classification performance of the best performing model is better than the best performing model with 5-fold.
In fact,
The best performing model in 2-fold CV gets the following scores across the two folds:
[0.82, 0.82]
(avg. = 0.82).That model is different from the best one I get with 5-fold CV, which yields the following AUC scores:
[0.4 , 1. , 0.75, 0.75, 0.25]
(avg = 0.64).
This takes me to the following question: Which model should I use? And why would I ever get a better model when training with less data?