Questions tagged [validation]

The process of assessing whether the results of an analysis are likely to hold outside of the original research setting. DO NOT use this tag for discussing 'validity' of a measurement or instrument (such as that it measures what it purports to), use [validity] tag instead.

References: Wikipedia and CV Meta's thread

787 questions
530
votes
11 answers

What is the difference between test set and validation set?

I found this confusing when I use the neural network toolbox in Matlab. It divided the raw data set into three parts: training set validation set test set I notice in many training or learning algorithm, the data is often divided into 2 parts, the…
xiaohan2012
  • 6,819
  • 5
  • 18
  • 18
72
votes
12 answers

Hold-out validation vs. cross-validation

To me, it seems that hold-out validation is useless. That is, splitting the original dataset into two-parts (training and testing) and using the testing score as a generalization measure, is somewhat useless. K-fold cross-validation seems to give…
user46925
55
votes
3 answers

How to select a clustering method? How to validate a cluster solution (to warrant the method choice)?

One of the biggest issue with cluster analysis is that we may happen to have to derive different conclusion when base on different clustering methods used (including different linkage methods in hierarchical clustering). I would like to know your…
43
votes
4 answers

How do you use the 'test' dataset after cross-validation?

In some lectures and tutorials I've seen, they suggest to split your data into three parts: training, validation and test. But it is not clear how the test dataset should be used, nor how this approach is better than cross-validation over the whole…
Serhiy
  • 959
  • 1
  • 8
  • 11
42
votes
3 answers

Why is it that my colleagues and I learned opposite definitions for test and validation sets?

In my master's program I learned that when building a ML model you: train the model on the training set compare the performance of this against the validation set tweak the settings and repeat steps 1-2 when you are satisfied, compare the final…
40
votes
2 answers

How to draw valid conclusions from "big data"?

"Big data" is everywhere in the media. Everybody says that "big data" is the big thing for 2012, e.g. KDNuggets poll on hot topics for 2012. However, I have deep concerns here. With big data, everybody seems to be happy just to get anything out. But…
Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
33
votes
3 answers

Do we need a test set when using k-fold cross-validation?

I've been reading about k-fold validation, and I want to make sure I understand how it works. I know that for the holdout method, the data is split into three sets, and the test set is only used at the very end to assess the performance of the…
b_pcakes
  • 435
  • 1
  • 4
  • 5
30
votes
3 answers

Should final (production ready) model be trained on complete data or just on training set?

Suppose I trained several models on training set, choose best one using cross validation set and measured performance on test set. So now I have one final best model. Should I retrain it on my all available data or ship solution trained only on…
Yurii
  • 1,724
  • 14
  • 26
24
votes
2 answers

Bayesian thinking about overfitting

I've devoted much time to development of methods and software for validating predictive models in the traditional frequentist statistical domain. In putting more Bayesian ideas into practice and teaching I see some key differences to embrace. …
23
votes
2 answers

Scikit correct way to calibrate classifiers with CalibratedClassifierCV

Scikit has CalibratedClassifierCV, which allows us to calibrate our models on a particular X, y pair. It also states clearly that data for fitting the classifier and for calibrating it must be disjoint. If they must be disjoint, is it legitimate to…
23
votes
4 answers

As a reviewer, can I justify requesting data and code be made available even if the journal does not?

As science must be reproducible, by definition, there is increasing recognition that data and code are an essential component of the reproduciblity, as discussed by the Yale Roundtable for data and code sharing. In reviewing a manuscript for a…
David LeBauer
  • 7,060
  • 6
  • 44
  • 89
23
votes
4 answers

How bad is hyperparameter tuning outside cross-validation?

I know that performing hyperparameter tuning outside of cross-validation can lead to biased-high estimates of external validity, because the dataset that you use to measure performance is the same one you used to tune the features. What I'm…
Ben Kuhn
  • 5,373
  • 1
  • 16
  • 27
20
votes
3 answers

How can we judge the accuracy of Nate Silver's predictions?

Firstly, he gives probability of outcomes. So, for example, his predictions for the U.S. election is currently 82% Clinton vs 18% Trump. Now, even if Trump wins, how do I know that it wasn't just the 18% of the time that he should've won? The other…
19
votes
3 answers

Splitting Time Series Data into Train/Test/Validation Sets

What's the best way to split time series data into train/test/validation sets, where the validation set would be used for hyperparameter tuning? We have 3 years' worth of daily sales data, and our plan is to use 2015-2016 as the training data, then…
meraxes
  • 669
  • 2
  • 6
  • 15
18
votes
4 answers

Is leave-one-out cross validation (LOOCV) known to systematically overestimate error?

Let's assume that we want to build a regression model that needs to predict the temperature in a build. We start from a very simple model in which we assume that the temperature only depends on weekday. Now we want to use k-fold validation to check…
Roman
  • 1,013
  • 2
  • 23
  • 38
1
2 3
52 53