Questions tagged [cross-validation]

Repeatedly withholding subsets of the data during model fitting in order to quantify the model performance on the withheld data subsets.

Refers to general procedures that attempt to determine the generalizability of a statistical result. Cross-validation arises frequently in the context of assessing how a particular model fit predicts future observations and how to optimally select model parameters.

Methods for cross-validation usually involve withholding a random subset of the data during model fitting (training set) and quantifying how accurate the withheld data are predicted (testing set) and repeating this process to get a measure of prediction accuracy. When this partitioning procedure happens once, it's called the holdout method.

The holdout method has two basic drawbacks:

In problems where we have a sparse dataset we may not be able to afford setting aside a portion of the dataset for testing
Since it is a single train-and-test experiment, the holdout estimate of error rate may have high variance due to the random nature in which the data was split.

One approach to dealing with these limitations is to use k-fold cross validation.

Create k equally sized partitions (i.e. folds) of the data. In practice k is often set to 10.
For each of the k partitions, use k-1 folds for training the model and the kth fold for testing.
For each of the k experiments, you'll get a prediction error. The average of the k prediction errors is the true error rate.

The advantage of k-fold cross validation is that all the examples in the dataset are eventually used for both training and testing, so it matters less how the data is partitioned. The variance of the resulting estimate is reduced as k is increased. The disadvantage of this method is that the training algorithm has to be rerun from scratch k times, which means it takes k times as much computation to make an evaluation.

When we set k = n, this is known as leave-one out cross validation, because each partition is made up of n-1 training data and 1 testing.

3195 questions

265

votes

13 answers

Is there any reason to prefer the AIC or BIC over the other?

The AIC and BIC are both methods of assessing model fit penalized for the number of estimated parameters. As I understand it, BIC penalizes models more for free parameters than does AIC. Beyond a preference based on the stringency of the criteria,…

asked Jul 23 '10 at 20:49

russellpierce

17,079
16
67
98

242

votes

7 answers

How to choose a predictive model after k-fold cross-validation?

I am wondering how to choose a predictive model after doing K-fold cross-validation. This may be awkwardly phrased, so let me explain in more detail: whenever I run K-fold cross-validation, I use K subsets of the training data, and end up with K…

cross-validation model-selection

asked Mar 15 '13 at 02:21

Berk U.

4,265
5
21
42

178

votes

5 answers

Training on the full dataset after cross-validation?

TL:DR: Is it ever a good idea to train an ML model on all the data available before shipping it to production? Put another way, is it ever ok to train on all data available and not check if the model overfits, or get a final read of the expected…

machine-learning cross-validation model-selection

asked Jun 05 '11 at 16:50

Amelio Vazquez-Reina

17,546
26
74
110

173

votes

4 answers

Choice of K in K-fold cross-validation

I've been using the $K$-fold cross-validation a few times now to evaluate performance of some learning algorithms, but I've always been puzzled as to how I should choose the value of $K$. I've often seen and used a value of $K = 10$, but this seems…

machine-learning classification cross-validation

asked May 04 '12 at 03:52

Charles Menguy

2,277
4
15
16

131

votes

4 answers

Nested cross validation for model selection

How can one use nested cross validation for model selection? From what I read online, nested CV works as follows: There is the inner CV loop, where we may conduct a grid search (e.g. running K-fold for every available model, e.g. combination of…

cross-validation model-selection

asked Jul 22 '13 at 15:53

Amelio Vazquez-Reina

17,546
26
74
110

130

votes

4 answers

Differences between cross validation and bootstrapping to estimate the prediction error

I would like your thoughts about the differences between cross validation and bootstrapping to estimate the prediction error. Does one work better for small dataset sizes or large datasets?

cross-validation predictive-models bootstrap

asked Nov 14 '11 at 14:57

grant

1,491
2
11
10

122

votes

8 answers

Bias and variance in leave-one-out vs K-fold cross validation

How do different cross-validation methods compare in terms of model variance and bias? My question is partly motivated by this thread: Optimal number of folds in $K$-fold cross-validation: is leave-one-out CV always the best choice?. The answer…

machine-learning variance cross-validation bias bias-variance-tradeoff

asked Jun 14 '13 at 20:14

Amelio Vazquez-Reina

17,546
26
74
110

111

votes

5 answers

Using k-fold cross-validation for time-series model selection

Question: I want to be sure of something, is the use of k-fold cross-validation with time series is straightforward, or does one need to pay special attention before using it? Background: I'm modeling a time series of 6 year (with semi-markov…

time-series modeling cross-validation

asked Aug 10 '11 at 17:20

Mickaël S

1,258
3
10
6

106

votes

10 answers

Validation Error less than training error?

I found two questions here and here about this issue but there is no obvious answer or explanation yet.I enforce the same problem where the validation error is less than training error in my Convolution Neural Network. What does that mean?

machine-learning mathematical-statistics neural-networks cross-validation

asked Dec 17 '15 at 22:04

Bido

1,163
2
8
5

101

votes

3 answers

Feature selection and cross-validation

I have recently been reading a lot on this site (@Aniko, @Dikran Marsupial, @Erik) and elsewhere about the problem of overfitting occuring with cross validation - (Smialowski et al 2010 Bioinformatics, Hastie, Elements of statistical learning). The…

cross-validation feature-selection

asked May 04 '12 at 10:09

BGreene

3,045
4
16
33

votes

6 answers

Feature selection for "final" model when performing cross-validation in machine learning

I am getting a bit confused about feature selection and machine learning and I was wondering if you could help me out. I have a microarray dataset that is classified into two groups and has 1000s of features. My aim is to get a small number of…

machine-learning classification cross-validation feature-selection genetics

asked Sep 02 '10 at 10:25

danielsbrewer

2,385
3
20
17

votes

5 answers

On the importance of the i.i.d. assumption in statistical learning

In statistical learning, implicitly or explicitly, one always assumes that the training set $\mathcal{D} = \{ \bf {X}, \bf{y} \}$ is composed of $N$ input/response tuples $({\bf{X}}_i,y_i)$ that are independently drawn from the same joint…

machine-learning cross-validation non-independent iid

asked May 19 '16 at 13:28

Quantuple

1,296
1
8
20

votes

5 answers

Cross-Validation in plain english?

How would you describe cross-validation to someone without a data analysis background?

cross-validation intuition

asked Aug 18 '10 at 13:11

Shane

11,961
17
71
89

votes

1 answer

How to split the dataset for cross validation, learning curve, and final evaluation?

What is an appropriate strategy for splitting the dataset? I ask for feedback on the following approach (not on the individual parameters like test_size or n_iter, but if I used X, y, X_train, y_train, X_test, and y_test appropriately and if the…

machine-learning cross-validation python scikit-learn

asked Apr 30 '14 at 10:44

tobip

1,450
4
14
11

votes

5 answers

Understanding stratified cross-validation

I read in Wikipedia: In stratified k-fold cross-validation, the folds are selected so that the mean response value is approximately equal in all the folds. In the case of a dichotomous classification, this means that each fold contains roughly…

cross-validation stratification

asked Feb 07 '13 at 20:58

Amelio Vazquez-Reina

17,546
26
74
110

2 3

…

99 100 Next