Statisticians often use parametric families of models, for example normal distributions with unknown mean and variance. However, nothing in real life is perfectly normally distributed (or distributed according to any other simple parametric family), yet in practice it seems these models work well enough to let us make useful inferences. That is, we seem to be able to use parametric models profitably in many cases where the exact distributional assumptions are not satisfied.
I'm aware of one theoretical result that helps justify this. It can be shown that maximum likelihood estimation minimizes the relative entropy between the "truth" and the MLE estimate. So in some sense, MLE gives the "best fit" among the parametric family. If the family is rich enough to capture most of the phenomenon under investigation, this seems to be like a good justification for MLE even when we know the distributional assumptions are not exactly met.
Are there other theoretical results (or books, papers, etc.) that shed light on the issue of parametric model performance under mild misspecification?