Why are parametric models usually robust to mild misspecification?

Question

Statisticians often use parametric families of models, for example normal distributions with unknown mean and variance. However, nothing in real life is perfectly normally distributed (or distributed according to any other simple parametric family), yet in practice it seems these models work well enough to let us make useful inferences. That is, we seem to be able to use parametric models profitably in many cases where the exact distributional assumptions are not satisfied.

I'm aware of one theoretical result that helps justify this. It can be shown that maximum likelihood estimation minimizes the relative entropy between the "truth" and the MLE estimate. So in some sense, MLE gives the "best fit" among the parametric family. If the family is rich enough to capture most of the phenomenon under investigation, this seems to be like a good justification for MLE even when we know the distributional assumptions are not exactly met.

Are there other theoretical results (or books, papers, etc.) that shed light on the issue of parametric model performance under mild misspecification?

Welcome to Cross Validated! At least for means, convergence theorems like the central limit theorem give us good robustness in many cases, even if a normality assumption is poor. Do you have other parametric families in mind besides normal distributions? — Dave, Feb 17 '22 at 02:47
Not posted as an answer because I take a tangent. More of a frame-challenge in relation to the premise *Why are parametric models usually robust to mild misspecification?* Is there good evidence for "usually"? How are we establishing this? How mild is mild? Robustness of what things? How robust is robust? It seems to me you can make the claim true just by defining "usually", "robust" and "mild" in a way to make the claim hold, but then it could be an arbitrarily-empty claim. $\:$ You mention normal models; is there even good evidence there? ... ctd — Glen_b, Feb 17 '22 at 05:49
ctd ... After all as soon as we get to an F-test for variance I'd argue that we don't really have level-robustness, for example. And even in the case of tests of means, what of power once we move away from the normal? In some directions of non-normality we rapidly lose out to commonly-chosen nonparametric competitors, while we can get nonparametric location tests with excellent power at the normal (some with A.R.E of 1). If we want better robustness in the immediate vicinity of the normal *in the case that it seems best suited for*, we don't really have to cast around all that far. — Glen_b, Feb 17 '22 at 05:49
I do think there's a potentially good question to investigate here, but it's a rather complicated question to get into if we're not just going to vaguely wave our hands and maybe shrug about what we really mean. It's rather easy to make pat statements (I'm potentially as guilty as anyone on that front) but harder to really say anything much that's both justified and actionable. — Glen_b, Feb 17 '22 at 05:55
In particular I think there's a strong tendency to defend parametric methods - especially often for ones based on the normal - by saying something that is either so broad as to be plainly false (by simple counterexamples), or which boils down to saying "it works when it works" (like "it works when the distribution is not too skew or heavy tailed"), which is more or less true but unhelpful (not actionable without a study suited to one's own circumstances), so why don't we just make sure most users know how to do that study in the first place? — Glen_b, Feb 17 '22 at 06:03
[A blog post of possible interest by our @DemetriPananos](https://dpananos.github.io/posts/2019/08/blog-post-23/) — Dave, Feb 17 '22 at 10:38
@Glen_b Thanks for your comments. I agree it's a complicated question but I welcome any informative answers, even if necessarily incomplete. In particular, if your reponse is "In fact, parametric methods are not usually very robust," then that would be a great answer. However my impression is that most statistical inference in the world uses relatively basic parametric models, and often succeeds, despite some mild lack of fit to reality. — alligator, Feb 17 '22 at 12:23
@Dave Maybe scale parameters, or estimating the parameter of a uniform distribution like in the German tank problem. Surely the tanks in Germany were not precisely uniformly distributed. Yet the methods worked well, as history shows. — alligator, Feb 17 '22 at 12:25

Why are parametric models usually robust to mild misspecification?

0 Answers0