9

I'm having trouble understanding what cross-validation is.

Also, what is the connection between cross-validation and the issue of model overfitting?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
user3851283
  • 307
  • 2
  • 4
  • 3
  • 1
    What is your understanding so far, say from the [Wikipedia article](https://en.wikipedia.org/wiki/Cross-validation_(statistics))? – Alexis Dec 11 '14 at 06:36
  • 1
    You could check also [this](http://stats.stackexchange.com/questions/14474/compendium-of-cross-validation-techniques) and [this](http://stats.stackexchange.com/questions/61783/model-variance-and-bias-in-cross-validation), and the references quoted in the answers. – Tim Dec 11 '14 at 07:29

1 Answers1

11

"Cross-validating" refers to attempts to check ("validate") whether the outcome of a statistical analysis will generalise to another sample. Typically, these involve running the same analyses ("cross") on comparable subsets of your data: e.g., you might divide your dataset into two (a "training" set and a "validation" set) and run the analysis on both.

An overfitted model is one which adequately, albeit artificially describes your current dataset, but performs poorly on yet unseen (but equivalent) datasets. This can happen if your model was derived ad hocly. Cross-validation is one way of checking against overfitting.

Jon
  • 368
  • 1
  • 7