I'm using $k$-fold cross validation technique for generating train, test and validation indexes for a neural network. My sample size is 230~700. What is best $k$ for cross validation here. Now I'm using 10-fold cross validation but I think it is too high. What is your idea?
Asked
Active
Viewed 4,270 times
0
-
1Have you tried K-K-fold cross validation to determine the best K? – AdamO Sep 02 '14 at 16:58
-
No. I used it to have more reliable model (accuracy,sensitivity and specificity) for my classification problem. + this is cost function of an optimization algorithm and i need more reliable average cost. – user2991243 Sep 02 '14 at 17:00
-
1I'm just kidding. It should just be enough to have confidence there's no uncertainty due to subsample choice. Traditional train-test validation is $k=1$, remember. $k$-fold "kicks in pretty quickly" as far as the $k$ is concerned, in my opinion. Double $k$-fold is not totally uncalled for if you HAVE to know, just do iterative split sample validation in your other $k$ to see how variable those model performance statistics are, but beware of small sample bias issues if you are getting very small $n$s there. – AdamO Sep 02 '14 at 17:09
-
Oh. I didn't get that :-D . So what is your opinion for this sample size? Do you think 10-fold is good? + I'm using neural network and when I see the main page in MATLAB, In all neural network designs validation (maximum=6) is stopping the training. – user2991243 Sep 02 '14 at 17:11
-
Are you sure that's training? I think most software tends to feedback iterations in the backpropogation, not validation, 6 seems like the case for that. I haven't used matlab. 10fold is almost always fine regardless of sample size. If sample size *is* an issue, then you should be validating with a bootstrap instead! – AdamO Sep 02 '14 at 17:19
-
Yes. The default early stopping of neural network toolbox sets to maximum 6 iterations. Thank you – user2991243 Sep 02 '14 at 17:38
1 Answers
0
Actually there is no straight answer to the choice of K in k-fold cross validation. An higher k will give you more but smaller subsets on which run testing. An adopted choice is to select the K that gives you a testing set with the size of 15% of your total dataset.
However, other methods are also available; you may want to consider permutations or exahustive cross validation methods (more infos here).
Hope it helps.

Filippo Mazza
- 137
- 6