My understanding of cross validation is that we create n partitions of the data, train on n-1, then test on 1. You go through n permutations. This means that we have n models, each tested once. So summing up the error is really more about evaluating the algorithm, or model parameters (e.g. how many trees in a gradient boosting machine). So for instance, I could do a run with 1k tree, and do a run with 5k trees, and even though the models are different each fold/validation, it should give me a better sense of whether 1k trees or 5k trees is better.
However, since each model is tested once, it doesn't really give me a better testing of an individual model. Is this a concern? Are there other ways to do that? Am I missing something?