First of all, I agree with @HEITZ: if all we have is equal cross validation performance, then that's all we have and it does not allow further distinction. Also, one model may be just as badly underfit as the other is overfit...
As usual, this is where external (independent) knowledge about the situation at hand helps a lot, e.g. in judging what is going on. I'm thinking of, say, a discriminative classifer vs. a one-class classifer that both yield the same predictions and thus the same error/performance measure. The one-class classifier is more complex - but the decision one-class classification vs. discriminative classifier should anyways be based on the nature of the data/application. And yet, there may be situations where one concludes that one-class classification would be needed but the available data does need a more restictive model (with important differences in the CV-measured performance).
However, I'd like to point out that it is possible to measure some symptoms of overfitting (namely, instability of predictions based on exchanging a few training cases) by iterated/repeated cross validation even if the chosen error measure per se does not penalize complexity.
Therefore, I reserve the right to not believe that the complex model is not overfit unless results are presented that clearly show that possible overfitting was checked and found to be absent and that excludes the possibility of reporting a lucky cross validation split (particularly if the complex model has hyperparameters that are aggressively optimized).
On the other hand, resampling validation cannot guard against drift in the underlying population - and such drift may either need a more complex model (human brain can correct for such drift in an amazing fashion!) or less complex model (that doesn't overfitt, so data drifting slightly out of the training space will not be subject to totally weird predictions).
Secondly, I'd like to argue that the usual optimization approaches we typically take from numeric optimization is meant for rather different situations than what we have here. Searching the (=one) best model may or may not be appropriate. A situation with a true global optimum may be expected when optimizing the complexity of essentially the same model (say, the ridge parameter). Thus, a situation that may be described as selecting one of a continuous family of models. But if the compared models span a variety of model families, I don't think a finding that a number of model families can achieve the same performance should be too surprising at all. In fact, if I found a logistic regression, LDA and linear SVM to perform equally well, the conclusion would be "linear classification works" rather than thinking how these models differ in their stability depending on the training cases. And still, I don't see why a non-linear model shouldn't perform as well if sufficient training data is available.
From a philosophical point of view, I'd say there's nothing that keeps nature from having tons of influencing factors and interactions between them. So parsimony doesn't make the model more true, it just
guards against overfitting. So iff the model valiation is done properly on independent cases, we don't need this safeguard as overfitting is suitably penalized. In practice, however, cross validation frequently doesn't achieve as independent splitting as we'd like to believe - so an additional safeguard is a very sensible precaution in practice
In theory, there is no difference beteween theory and practice.
In practice, there is.
In that sense, I think that Occam's Razor is more important for us (modeling folk) than for the models: we humans are known to be notoriously bad at detecting overfitting. I'm an optimist, though, and think that detecting overfitting can be learned. :-D
it also allows us to construct predictive models that achieve reasonable prediction based on a few input variates (possibly easier assessment), and that are possibly easier to study, say, in terms of what part of the input and model space are actually populated by our data. In addition, such models may be more easily correlated (or augmented) by independent/external knowledge.