Different common meanings of training set, validation set, test set

Question

In this previous question of mine it was pointed out in an insightful comment by ReneBT that the usage of the 3 terms training data set, validation set and test set is not uniform across the cross validated community (which lead to different interpretation in the answers to my question, which in its first edit didn't fix the terminology regarding these terms).

Can you please point out what the proper usage versus the common usage is? And what variations of common usage are often met in practice? (This highly upvoted question has collected some definition, but even there there seems to be some disagreement about the terminology...)

It seems to me that there in particular regarding validation there are various different approaches: use crossvalidation on the training data set to do model selection (and find the optimal model parameters), or use a separate validation set for that.

In biology there's a committee of some sort that declares what the true name of a species is. In astronomy, there's a committee that names celestial objects, and determines their class (e.g. that Pluto is not a planet). There is no such governing body in statistics that sets rules and ensures that they are obeyed. Things are named by consensus - sometimes the same thing is given a different name, sometimes different things are given the same name (fixed and random effects are examples of both.) — Jeremy Miles, May 14 '19 at 15:28
@JeremyMiles I know, in math it's the same, this is standard practice of the community. The question is, understanding the (injective) mappings of terms to meanings and meanings to terms. — l7ll7, May 15 '19 at 02:15
I don't think you'll ever do that. There will never be injective mappings of terms to meanings (except for a small set of terms and meanings), because no one can say what the terms are. Many articles start by defining their terms, to make sure that we're all on the same page. (My person bugbear in the other direction: True positive rate (TPR), Recall, Sensitivity, probability of detection, Power all mean the same thing - can't we agree on a single term?) — Jeremy Miles, May 15 '19 at 15:02

cbeleites unhappy with SX · Answer 1 · 2019-05-17T23:19:40.340

still to be completed. This is just a saved early version so far.

I think this a very good and important question: IMHO the common (frequently used) terminology causes a lot of confusion.

I'll first outline how I think the common usage of the terms evolved, and why I find it particularly confusing. In the 2nd part I'll propose an alternative naming scheme that hopefully avoids this confusion.

Common/frequent usage

The most common terminology I see is train/validation/test (see e.g. Wikipedia: Training, validation, and test sets).

I think this splitting terminology developed (historically)

In easy/comfortable circumstances, i.e.
- small number of features $p$,
- sample size $n \gg p$ - even if $n$ itself is not that large,
- no substructure (clustering, data hierarchy, correlation between [groups of] samples/cases) within the data and
- low model complexity together with
- mathematically well understood model (e.g. linear model)
both model and generalization error can directly be derived (analytically) from within the data set (e.g. prediction intervals of univariate linear model).

In this situation,
- the risk of overfitting is negligible (due to $n >> p$ together with low complexity of model: we have many degrees of freedom left),
- we can therefor use training error as good approximation to generalization error
- $n \gg p$ may be reached with absolutely rather small $n$ (e.g. a univariate linear model with slope and intercept does fine with $n$ = 10.). Analytical expressions e.g. predicition intervals
As
- the number of features $p$ increases
- it becomes more difficult to keep

Difficulties with this terminology

Verification and validation

Proposed terminology:

training
optimization
verification

Do you know if you will find time maybe the next days to complete the answer? — l7ll7, May 16 '19 at 03:22
cbeleites Will you ever complete this answer? It's been 4 months now... — l7ll7, Aug 24 '19 at 13:04

Different common meanings of training set, validation set, test set

1 Answers1

Common/frequent usage

Difficulties with this terminology

Proposed terminology: