I am facing the evaluation of a genetic programming algorithm. I am using the Proben1 cancer1 dataset to evaluate the models created by this algorithm. This dataset contains 699 samples, which is currently divided into 50% training, 25% validation and 25% test data. Many academic articles use k-fold validation for evaluation of the resulting models.
I do understand the creation of k models to reduce the variance in the error rating percentage. However, I do not understand why it would not be preferable to do x times the hold-out (k=2) method, where every time the data is randomly partitioned into training and test data.
The reason for my lack of understanding is that I think that models evaluated with a higher order k generalize less due to simple fact that they are trained with a higher percentage of the data.
Given my data set of 699 samples which of the following two methods would be preferable and why;
- the division in training/validation/test and perhaps this repeated x times randomly assigning sample to each set with a test set of 25%
- 10-fold cross validation but thus with a test set of just 10%.