0

I'm using $k$-fold cross validation technique for generating train, test and validation indexes for a neural network. My sample size is 230~700. What is best $k$ for cross validation here. Now I'm using 10-fold cross validation but I think it is too high. What is your idea?

Sycorax
  • 76,417
  • 20
  • 189
  • 313
user2991243
  • 3,621
  • 4
  • 22
  • 48
  • 1
    Have you tried K-K-fold cross validation to determine the best K? – AdamO Sep 02 '14 at 16:58
  • No. I used it to have more reliable model (accuracy,sensitivity and specificity) for my classification problem. + this is cost function of an optimization algorithm and i need more reliable average cost. – user2991243 Sep 02 '14 at 17:00
  • 1
    I'm just kidding. It should just be enough to have confidence there's no uncertainty due to subsample choice. Traditional train-test validation is $k=1$, remember. $k$-fold "kicks in pretty quickly" as far as the $k$ is concerned, in my opinion. Double $k$-fold is not totally uncalled for if you HAVE to know, just do iterative split sample validation in your other $k$ to see how variable those model performance statistics are, but beware of small sample bias issues if you are getting very small $n$s there. – AdamO Sep 02 '14 at 17:09
  • Oh. I didn't get that :-D . So what is your opinion for this sample size? Do you think 10-fold is good? + I'm using neural network and when I see the main page in MATLAB, In all neural network designs validation (maximum=6) is stopping the training. – user2991243 Sep 02 '14 at 17:11
  • Are you sure that's training? I think most software tends to feedback iterations in the backpropogation, not validation, 6 seems like the case for that. I haven't used matlab. 10fold is almost always fine regardless of sample size. If sample size *is* an issue, then you should be validating with a bootstrap instead! – AdamO Sep 02 '14 at 17:19
  • Yes. The default early stopping of neural network toolbox sets to maximum 6 iterations. Thank you – user2991243 Sep 02 '14 at 17:38

1 Answers1

0

Actually there is no straight answer to the choice of K in k-fold cross validation. An higher k will give you more but smaller subsets on which run testing. An adopted choice is to select the K that gives you a testing set with the size of 15% of your total dataset.

However, other methods are also available; you may want to consider permutations or exahustive cross validation methods (more infos here).

Hope it helps.