0

In An Introduction to Statistical Learning, the following statement is made comparing leave-one-out cross validation to using a single validation set:

LOOCV has a couple of major advantages over the validation set approach. First, it has far less bias. In LOOCV, we repeatedly fit the statistical learning method using training sets that contain $n − 1$ observations, almost as many as are in the entire data set. This is in contrast to the validation set approach, in which the training set is typically around half the size of the original data set. Consequently, the LOOCV approach tends not to overestimate the test error rate as much as the validation set approach does.

I would think that LOOCV actually just provides a better estimate than using a single validation set, since it is able to fit using more data, and therefore would have a lower variance rather than a lower bias. Why would the bias be lower?

1 Answers1

1

LOOCV has a lower variance of the fit compared with the validation set approach, but its aim is not the fit but the estimation of the generalisation error. What we'd like to have is an estimate of the generalisation error from a fit based on $n$ observations. A validation set approach where you split the data into two halves will get you an estimate of the generalisation error from a fit based on half of the observations. As the fit will have a larger variance, the generalisation error will on average be estimated too high (variance in the fit translates into bias in the generalisation error, because a fit going very wrong in any direction of the observation you want to predict will yield a high error). LOOCV gives us an estimate of the generalisation error from a fit based on $n-1$ observations. Lower than $n$, therefore still bias in the generlisation error, but lower, because $n-1$ observations will allow for a more precise fit.

Christian Hennig
  • 10,796
  • 8
  • 35
  • Would the estimation of the generalization error also have less variance with LOOCV, in addition to having less bias? – interoception Dec 30 '20 at 15:28
  • No. The problem with LOOCV is that the subsamples of size $n-1$ have a large overlap (of size $n-2$). This means that they will not vary as much in terms of the fit as independent samples would. This translates into a fairly large variance of the generalisation error, because on another full independent sample all LOOCV fits and therefore the estimated generalisation error can be quite different. If for a validation set approach you split the data set just once, the variance may be even higher, but if you split lots of times, it will be lower because of less overlap between subsamples. – Christian Hennig Dec 30 '20 at 23:08