k-fold cross validation vs k times hold-out validation

Asked Apr 22 '13 at 23:27

Active Apr 22 '13 at 23:34

Viewed 1,119 times

I am facing the evaluation of a genetic programming algorithm. I am using the Proben1 cancer1 dataset to evaluate the models created by this algorithm. This dataset contains 699 samples, which is currently divided into 50% training, 25% validation and 25% test data. Many academic articles use k-fold validation for evaluation of the resulting models.

I do understand the creation of k models to reduce the variance in the error rating percentage. However, I do not understand why it would not be preferable to do x times the hold-out (k=2) method, where every time the data is randomly partitioned into training and test data.

The reason for my lack of understanding is that I think that models evaluated with a higher order k generalize less due to simple fact that they are trained with a higher percentage of the data.

Given my data set of 699 samples which of the following two methods would be preferable and why;

the division in training/validation/test and perhaps this repeated x times randomly assigning sample to each set with a test set of 25%
10-fold cross validation but thus with a test set of just 10%.

edited Apr 22 '13 at 23:34

asked Apr 22 '13 at 23:27

Aktaeon

Could you perhaps provide a hyperlink to the dataset? Thanks in advance. – Jim Jun 04 '18 at 14:10

k-fold cross validation vs k times hold-out validation

0 Answers0

Linked