1

In this previous question of mine it was pointed out in an insightful comment by ReneBT that the usage of the 3 terms training data set, validation set and test set is not uniform across the cross validated community (which lead to different interpretation in the answers to my question, which in its first edit didn't fix the terminology regarding these terms).

Can you please point out what the proper usage versus the common usage is? And what variations of common usage are often met in practice? (This highly upvoted question has collected some definition, but even there there seems to be some disagreement about the terminology...)

It seems to me that there in particular regarding validation there are various different approaches: use crossvalidation on the training data set to do model selection (and find the optimal model parameters), or use a separate validation set for that.

l7ll7
  • 1,075
  • 2
  • 9
  • 15
  • In biology there's a committee of some sort that declares what the true name of a species is. In astronomy, there's a committee that names celestial objects, and determines their class (e.g. that Pluto is not a planet). There is no such governing body in statistics that sets rules and ensures that they are obeyed. Things are named by consensus - sometimes the same thing is given a different name, sometimes different things are given the same name (fixed and random effects are examples of both.) – Jeremy Miles May 14 '19 at 15:28
  • @JeremyMiles I know, in math it's the same, this is standard practice of the community. The question is, understanding the (injective) mappings of terms to meanings and meanings to terms. – l7ll7 May 15 '19 at 02:15
  • I don't think you'll ever do that. There will never be injective mappings of terms to meanings (except for a small set of terms and meanings), because no one can say what the terms are. Many articles start by defining their terms, to make sure that we're all on the same page. (My person bugbear in the other direction: True positive rate (TPR), Recall, Sensitivity, probability of detection, Power all mean the same thing - can't we agree on a single term?) – Jeremy Miles May 15 '19 at 15:02
  • Another bugbear - the multiple meanings of alpha. – Jeremy Miles May 15 '19 at 15:03

1 Answers1

1

still to be completed. This is just a saved early version so far.

I think this a very good and important question: IMHO the common (frequently used) terminology causes a lot of confusion.

I'll first outline how I think the common usage of the terms evolved, and why I find it particularly confusing. In the 2nd part I'll propose an alternative naming scheme that hopefully avoids this confusion.

Common/frequent usage

The most common terminology I see is train/validation/test (see e.g. Wikipedia: Training, validation, and test sets).

I think this splitting terminology developed (historically)

  1. In easy/comfortable circumstances, i.e.

    • small number of features $p$,
    • sample size $n \gg p$ - even if $n$ itself is not that large,
    • no substructure (clustering, data hierarchy, correlation between [groups of] samples/cases) within the data and
    • low model complexity together with
    • mathematically well understood model (e.g. linear model)

    both model and generalization error can directly be derived (analytically) from within the data set (e.g. prediction intervals of univariate linear model).

    In this situation,

    • the risk of overfitting is negligible (due to $n >> p$ together with low complexity of model: we have many degrees of freedom left),
    • we can therefor use training error as good approximation to generalization error
    • $n \gg p$ may be reached with absolutely rather small $n$ (e.g. a univariate linear model with slope and intercept does fine with $n$ = 10.). Analytical expressions e.g. predicition intervals
  2. As

    • the number of features $p$ increases
    • it becomes more difficult to keep

Difficulties with this terminology

  • Verification and validation

Proposed terminology:

  • training
  • optimization
  • verification
cbeleites unhappy with SX
  • 34,156
  • 3
  • 67
  • 133