29

I have some data and I want to build a model (say a linear regression model) out of this data. In a next step, I want to apply Leave-One-Out Cross-Validation (LOOCV) on the model so see how good it performs.

If I understood LOOCV right, I build a new model for each of my samples (the test set) using every sample except this sample (the training set). Then I use the model to predict the test set and calculate the errors $(\text{predicted} - \text{actual})$.

In a next step I aggregate all the errors generated using a chosen function, for example mean squared error. I can use these values to judge on the quality (or goodness of fit) of the model.

Question: Which model is the model these quality-values apply for, so which model should I choose if I find the metrics generated from LOOCV appropriate for my case? LOOCV looked at $n$ different models (where $n$ is the sample size); which one is the model I should choose?

  • Is it the model which uses all the samples? This model was never calculated during the LOOCV process!
  • Is it the model which has the least error?
amoeba
  • 93,463
  • 28
  • 275
  • 317
theomega
  • 587
  • 2
  • 7
  • 12

1 Answers1

25

It is best to think of cross-validation as a way of estimating the generalisation performance of models generated by a particular procedure, rather than of the model itself. Leave-one-out cross-validation is essentially an estimate of the generalisation performance of a model trained on $n-1$ samples of data, which is generally a slightly pessimistic estimate of the performance of a model trained on $n$ samples.

Rather than choosing one model, the thing to do is to fit the model to all of the data, and use LOO-CV to provide a slightly conservative estimate of the performance of that model.

Note however that LOOCV has a high variance (the value you will get varies a lot if you use a different random sample of data) which often makes it a bad choice of estimator for performance evaluation, even though it is approximately unbiased. I use it all the time for model selection, but really only because it is cheap (almost free for the kernel models I am working on).

Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
  • Thanks for the answer. Isn't the sentence "use LOO-CV to provide a slightly conservative estimate of the performance of that model." wrong is the general case? The model might get worse if I add another point, in that case the LOO-CV might be an. optimistic estimate – theomega May 04 '12 at 13:14
  • 1
    The more data you use to build the model, generally the better the model is likely to be. While the additional point may make the model a little worse, it is more likely to make the model a little better. So in general loocv has a slight pessimistic bias, but it is only very slight, the variance of the LOOCV estimator is usually a far greater consideration. – Dikran Marsupial May 04 '12 at 13:17
  • 1
    What *should* you use for performance evaluation then? (Assuming data collection is expensive so you want to use all available data to fit the model). – Sideshow Bob Dec 16 '15 at 17:46
  • Bootstrap probably. Most of the models I use have regularisation parameters etc. that need to be tuned, so I often use LOOCV for tuning the models and bootstrap or repeated hold-out for performance evaluation. – Dikran Marsupial Dec 16 '15 at 19:36
  • @DikranMarsupial Are you sure about the fact that Leave-One-Out CV provides a pessimistic bias? As far as I know, it usually provides lower error estimate than K-Fold, for example. Also, Doesn't LOOCV have 0 variance? You can only do LOOCV once, then "you run out of sample". The only variance I can think of is the one produced by the training algorithms used to fit the model. But this should be variance asociated with the variance of optimal parameters, not with the model error itself. Thank you. – D1X Jul 24 '17 at 17:21
  • @D1X it is very slightly pessimistically biased (as a model with all n patterns used for training would be likely to perform slightly better), but it is *almost* unbiased. It has a high variance in the sense that if you were to repeat the estimate with a *different* set of n patterns you would get more different estimates each time than you would get using k-fold cross-validation or bootstrap. – Dikran Marsupial Jul 25 '17 at 12:26