Is there a limit to how big you should make K for k-fold cross validation? I understand as K gets bigger performing the CV will take longer, but aside from that, is there any reason not to make K = n? is there a point where your validation sets are so small that even if you average them, the result is garbage?
Asked
Active
Viewed 1,964 times
5
-
Consider removing the sample-size tag, I am not sure this is what you are looking for here – Christoph Hanck Jun 19 '15 at 08:26
-
2http://stats.stackexchange.com/questions/90902/why-is-leave-one-out-cross-validation-loocv-variance-about-the-mean-estimate-f – Zhubarb Jun 19 '15 at 08:39
-
1$K = n$ is also known as Leave-One-Out Cross-Validation. "The most obvious advantage" of $k =5$ or $k = 10$ "is computational, but putting computational issues aside, a less obvious but potentially more important advantage is that it often gives more accurate estimates of the test error rate than" $k = n$ does. As @Christoph Hanck posted, it has to do with a bias-variance trade-off. $k =5$ or $k = 10$ "have been shown empirically to yield test error rate estimates that suffer neither from excessively high bias nor from very high variance". [James et al (2013), *ISL*, 5.1.3-5.1.4.] – Ekaba Bisong Jun 19 '15 at 09:09
-
@Zhubarb....thanks for the link (+1), was still typing when yours came in. – Ekaba Bisong Jun 19 '15 at 09:14
-
1@EkabaBisong: While it's fine to cite other people, you must make it clear when you're doing so & give a proper attribution. Please read the help [here](http://stats.stackexchange.com/help/referencing) & re-write your comment (or expand it into an answer). [As you weren't around, I've done it for you.] – Scortchi - Reinstate Monica Jun 19 '15 at 12:12
-
1@EkabaBisong I've expanded the quoting in your comment to cover the parts of sec 5.1.3. StackExchange requires proper attribution; you can quote such works but you must clearly indicate the source of them. – Glen_b Jun 19 '15 at 12:43
-
1@Scortchi and Glen_b....Thank you for the help and duly noted. +1 – Ekaba Bisong Jun 19 '15 at 16:23
1 Answers
4
It's related to a bias-variance tradeoff. If you take $K=n$, your folds will be of size $n-1$, almost as large as your actual training sample. So the predictions from these samples will be based on almost as much information as that contained in the full training sample, thus mimicking its predictive performance quite well on average, resulting in low bias.
On the other hand, these folds will be highly correlated, as they are all almost identical. And basic statistics tell us that averaging highly correlated random variables produces an average (here: the CV error, i.e. the average of the prediction errors from the $K$ folds) that is still highly variable.
James et al. recommend $K=5$ or $K=10$ as a compromise in this tradeoff.

Christoph Hanck
- 25,948
- 3
- 57
- 106
-
1+1 for the link. Interms of bias reduction, Leave-one-out CV is preferred over k-fold CV. But as you point out, bias is not the only concern we have. Leave-one-out CV has higher variance than k-fold due to the fact that the fold sets in Leave-one-out are very highly correlated. ( not to mention Leave-one-out CV is just impractical for most problems due to computational time.) – Zhubarb Jun 19 '15 at 08:33