Highest Voted 'validation' Questions - Statistical Analysis Stack Exchange

530

votes

11 answers

What is the difference between test set and validation set?

I found this confusing when I use the neural network toolbox in Matlab. It divided the raw data set into three parts: training set validation set test set I notice in many training or learning algorithm, the data is often divided into 2 parts, the…

machine-learning validation

asked Nov 28 '11 at 11:05

xiaohan2012

6,819
5
18
18

72

votes

12 answers

Hold-out validation vs. cross-validation

To me, it seems that hold-out validation is useless. That is, splitting the original dataset into two-parts (training and testing) and using the testing score as a generalization measure, is somewhat useless. K-fold cross-validation seems to give…

machine-learning cross-validation validation

asked Jun 25 '14 at 13:41

user46925

55

votes

3 answers

How to select a clustering method? How to validate a cluster solution (to warrant the method choice)?

One of the biggest issue with cluster analysis is that we may happen to have to derive different conclusion when base on different clustering methods used (including different linkage methods in hierarchical clustering). I would like to know your…

clustering validation model-evaluation hierarchical-clustering

asked Feb 13 '16 at 23:19

Learner

789
1
7
16

43

votes

4 answers

How do you use the 'test' dataset after cross-validation?

In some lectures and tutorials I've seen, they suggest to split your data into three parts: training, validation and test. But it is not clear how the test dataset should be used, nor how this approach is better than cross-validation over the whole…

machine-learning cross-validation validation

asked May 18 '15 at 21:02

Serhiy

959
1
8
11

42

votes

3 answers

Why is it that my colleagues and I learned opposite definitions for test and validation sets?

In my master's program I learned that when building a ML model you: train the model on the training set compare the performance of this against the validation set tweak the settings and repeat steps 1-2 when you are satisfied, compare the final…

machine-learning neural-networks cross-validation terminology validation

asked May 24 '21 at 13:59

Jacob Myer

585
4
8

40

votes

2 answers

How to draw valid conclusions from "big data"?

"Big data" is everywhere in the media. Everybody says that "big data" is the big thing for 2012, e.g. KDNuggets poll on hot topics for 2012. However, I have deep concerns here. With big data, everybody seems to be happy just to get anything out. But…

data-mining dataset large-data validation

asked Feb 09 '12 at 08:30

Has QUIT--Anony-Mousse

39,639
7
61
96

33

votes

3 answers

Do we need a test set when using k-fold cross-validation?

I've been reading about k-fold validation, and I want to make sure I understand how it works. I know that for the holdout method, the data is split into three sets, and the test set is only used at the very end to assess the performance of the…

cross-validation validation out-of-sample

asked Jul 27 '16 at 17:30

b_pcakes

435
1
4
5

30

votes

3 answers

Should final (production ready) model be trained on complete data or just on training set?

Suppose I trained several models on training set, choose best one using cross validation set and measured performance on test set. So now I have one final best model. Should I retrain it on my all available data or ship solution trained only on…

machine-learning validation regression-strategies

asked Nov 29 '15 at 11:40

Yurii

1,724
14
26

24

votes

2 answers

Bayesian thinking about overfitting

I've devoted much time to development of methods and software for validating predictive models in the traditional frequentist statistical domain. In putting more Bayesian ideas into practice and teaching I see some key differences to embrace. …

bayesian cross-validation predictive-models validation regression-strategies

asked Apr 29 '18 at 12:16

Frank Harrell

74,029
5
148
322

23

votes

2 answers

Scikit correct way to calibrate classifiers with CalibratedClassifierCV

Scikit has CalibratedClassifierCV, which allows us to calibrate our models on a particular X, y pair. It also states clearly that data for fitting the classifier and for calibrating it must be disjoint. If they must be disjoint, is it legitimate to…

cross-validation scikit-learn validation train calibration

asked Feb 22 '17 at 12:02

sapo_cosmico

374
1
2
10

23

votes

4 answers

As a reviewer, can I justify requesting data and code be made available even if the journal does not?

As science must be reproducible, by definition, there is increasing recognition that data and code are an essential component of the reproduciblity, as discussed by the Yale Roundtable for data and code sharing. In reviewing a manuscript for a…

dataset validation reproducible-research references

asked Aug 17 '11 at 16:52

David LeBauer

7,060
6
44
89

23

votes

4 answers

How bad is hyperparameter tuning outside cross-validation?

I know that performing hyperparameter tuning outside of cross-validation can lead to biased-high estimates of external validity, because the dataset that you use to measure performance is the same one you used to tune the features. What I'm…

cross-validation validation hyperparameter

asked Feb 12 '15 at 21:22

Ben Kuhn

5,373
1
16
27

20

votes

3 answers

How can we judge the accuracy of Nate Silver's predictions?

Firstly, he gives probability of outcomes. So, for example, his predictions for the U.S. election is currently 82% Clinton vs 18% Trump. Now, even if Trump wins, how do I know that it wasn't just the 18% of the time that he should've won? The other…

forecasting prediction validation accuracy scoring-rules

asked Oct 08 '16 at 13:28

Dinosaur Soup

201
1
3

19

votes

3 answers

Splitting Time Series Data into Train/Test/Validation Sets

What's the best way to split time series data into train/test/validation sets, where the validation set would be used for hyperparameter tuning? We have 3 years' worth of daily sales data, and our plan is to use 2015-2016 as the training data, then…

time-series cross-validation validation

asked May 18 '18 at 03:53

meraxes

669
2
6
15

18

votes

4 answers

Is leave-one-out cross validation (LOOCV) known to systematically overestimate error?

Let's assume that we want to build a regression model that needs to predict the temperature in a build. We start from a very simple model in which we assume that the temperature only depends on weekday. Now we want to use k-fold validation to check…

machine-learning cross-validation validation stratification

asked Dec 08 '19 at 10:13

Roman

1,013
2
23
38

Questions tagged [validation]