What is cross-validation?

Question

I'm having trouble understanding what cross-validation is.

Also, what is the connection between cross-validation and the issue of model overfitting?

What is your understanding so far, say from the [Wikipedia article](https://en.wikipedia.org/wiki/Cross-validation_(statistics))? — Alexis, Dec 11 '14 at 06:36
You could check also [this](http://stats.stackexchange.com/questions/14474/compendium-of-cross-validation-techniques) and [this](http://stats.stackexchange.com/questions/61783/model-variance-and-bias-in-cross-validation), and the references quoted in the answers. — Tim, Dec 11 '14 at 07:29

Jon · Answer 1 · 2014-12-11T09:59:48.033

11

"Cross-validating" refers to attempts to check ("validate") whether the outcome of a statistical analysis will generalise to another sample. Typically, these involve running the same analyses ("cross") on comparable subsets of your data: e.g., you might divide your dataset into two (a "training" set and a "validation" set) and run the analysis on both.

An overfitted model is one which adequately, albeit artificially describes your current dataset, but performs poorly on yet unseen (but equivalent) datasets. This can happen if your model was derived ad hocly. Cross-validation is one way of checking against overfitting.

edited Dec 11 '14 at 09:59

answered Dec 11 '14 at 09:50

Jon

368
1
7

2

Helpful answer, and nice use of "ad hocly." – rolando2 Dec 11 '14 at 09:58
1

I believe "...derived *ad hoc*" would be reasonably grammatical (I think it's literally 'to this', but sometimes/usually rendered as 'for this') – Glen_b Dec 11 '14 at 10:19
2

(+1) But it might be useful to (1) explain deriving models "ad hocly", & (2) note that estimating too many parameters can lead to overfitting even with the most unadhocly procedures. – Scortchi - Reinstate Monica Dec 11 '14 at 12:10
1

I laughed too hard at *unadhocly*. – Marc Claesen Dec 11 '14 at 14:43

What is cross-validation?

1 Answers1

Linked