What is a good academic citation for cross-validation?

Question

I performed a holdout cross-validation analysis on a multilevel model fit. The purpose of this was to show that we didn't have a problem with over-fitting, for which it worked just fine. Now we are writing it up for publication and I need a citation to support my methodology. I am looking for a good canonical statistical reference, ideally a book, that does a nice job explaining why holdout cross-validation is a real thing that people do and makes sense in this application. The paper will be published in a biological journal, so I am looking to point non-statistical types to a general reference. Somehow, none of my books seem to quite do it. The wikipedia entry would be perfectly adequate for my purpose, but I'd rather not cite wikipedia. Any suggestions?

"Holdout" and "cross-validation" are slightly contradictory. Most people who use the term "holdout sample" are done 1-fold cross-validation, a highly inefficient approach. Please clarify. 100 repeats of 10-fold cross-validation, or the bootstrap, would be good approaches. — Frank Harrell, Jan 12 '12 at 13:13

Momo · Answer 1 · 2012-01-12T11:49:04.897

6

I find chapter 7 of Hastie, Tibshirani, Friedman's Elements of Statistical Learning to be a good reference for CV and how and why it is used.

edited Jan 12 '12 at 11:49

answered Jan 12 '12 at 02:44

Momo

8,839
3
46
59

1

Thanks. This will work nicely and it looks like a great book too; I'll probably actually buy it (!). – yolio Jan 12 '12 at 18:11
@yolio, it's always nice to have a physical copy, and it's nice to support the authors for their work, but also note that the authors have put a copy in pdf format on their website. On the page supplied above, click "download the book". – gung - Reinstate Monica Jan 13 '12 at 18:23

What is a good academic citation for cross-validation?

1 Answers1