1

My understanding of cross validation is that we create n partitions of the data, train on n-1, then test on 1. You go through n permutations. This means that we have n models, each tested once. So summing up the error is really more about evaluating the algorithm, or model parameters (e.g. how many trees in a gradient boosting machine). So for instance, I could do a run with 1k tree, and do a run with 5k trees, and even though the models are different each fold/validation, it should give me a better sense of whether 1k trees or 5k trees is better.

However, since each model is tested once, it doesn't really give me a better testing of an individual model. Is this a concern? Are there other ways to do that? Am I missing something?

jojo
  • 11
  • 2
  • 1
    +1. Cross-validation is `about evaluating the algorithm`, that is exactly correct. As @DikranMarsupial likes to say, "[It is best to think of cross-validation as a way of estimating the generalisation performance of models generated by a particular procedure, rather than of the model itself](http://stats.stackexchange.com/a/27456/28666)". – amoeba Dec 16 '15 at 17:00
  • 2
    In fact, your question can be considered a duplicate of that one. I suggest you read Dikran's answer there and if something remains unclear, edit your question to specify the remaining concern. – amoeba Dec 16 '15 at 17:09
  • Thanks! I'll check out Dikran's response. Don't think I can up vote a comment. I guess the only remaining question would be if I could get a reference to evaluation techniques that are more about thoroughly evaluating a single model to protect against (or at least measure) overfitting. Just trying to make sure I'm not missing something. I imagine repeated sampling, leave one out, repeated cv, etc. – jojo Dec 16 '15 at 18:59
  • What exactly do you mean when you say "single model"? – amoeba Dec 16 '15 at 19:39
  • See also a more detailed Dikran's answer in http://stats.stackexchange.com/questions/11602. – amoeba Dec 16 '15 at 19:44

0 Answers0