8

I have read that leave-one-out cross-validation provides a relatively “unbiased estimate of the true generalization performance” (e.g. here) and that this is an advantageous property of the leave-one-out CV.

However, I don't see how this follows from the properties of leave-one-out CV. Why is the bias of this estimator low when compared to others?

Update:

I keep investigating the topic, and I believe it has to do with the fact that this estimator is less pessimistic than, say, K-fold validation, since it uses all the data but one instance, but it would be great to read a mathematical derivation of this.

Amelio Vazquez-Reina
  • 17,546
  • 26
  • 74
  • 110
  • 1
    Hi Amelio - I have seen several mathematical derivations of this in the papers cited in this answer https://stats.stackexchange.com/questions/280665/does-k-fold-cv-with-k-n-loo-provide-the-most-or-least-variable-estimates/358278#358278 - is there one in particular you want to see ? – Xavier Bourret Sicotte Jul 24 '18 at 12:07

1 Answers1

3

I don't think there is a need for a mathematical derivation of the fact that in ML, with increasing training test size, the prediction error rates decrease. LOO -- compared to k-fold validation -- maximizes the training set size, as you have observed.

However, LOO can be sensitive to "twinning" -- when you have highly correlated samples, with LOO you have the guarantee that for each sample used as a test set, the remaining "twins" will be in the training set. This can be diagnosed by a rapid decrease in accuracy when LOO is replaced by, say, 10-fold crossvalidation (or a stratified validation, if for example the samples are paired). In my experience, this can lead to a disaster if generally your data set is small.

In a perfect world, you have also a validation set that you never use to train your model, not even in a CV setting. You keep it for the sole purpose of testing the final performance of a model before you send of the paper :-)

January
  • 6,999
  • 1
  • 32
  • 55