The somewhat unsettling truth is that misspecification testing is not suitable for "persuading a skeptic that the model is valid". Generally, as you obviously understand, not rejecting the $H_0$ does not imply that the $H_0$ is true, and this is the case also in misspecification testing. What the test does is something weaker, namely it just tells you that certain observable problems with the model assumptions have not occurred. Still the misspecification test will not rule out that the data has been generated in a way that violate the model assumptions and may violate them badly. For example, an evil dependence structure could be at work that enforces the data to show a certain seemingly innocent pattern that you see even though this may be contrived enough to not look suspicious to your favourite test for independence (I'm not claiming that this is realistic, I'm just claiming that a misspecification test cannot rule out that this is technically possible).
Misspecification testing can to a certain extent reassure you, but it cannot secure model assumptions to be true.
Note that some would argue that the term "valid" is weaker than the term "true", and A. Spanos (2018) argues that if you do misspecification testing in the right way (i.e., testing all assumptions in a reasonable order, meaning that the misspecification test of one assumption is not sabotaged by the failure of another assumption), ultimately indeed you can be sure that the model is "valid" for the data, even though this doesn't mean it's "true". The way he does this is by defining the term "valid" basically as passing all those tests, because then, according to him, we know that the data looks like a typical realisation from the model. I think that this is misleading though, because as I have argued above, this does not rule out that in fact model assumptions are violated in harmful ways.
A message from this is that misspecification testing is never a substitute for thinking about the subject matter and the data generating process in order to know whether there are problems with the assumptions that you couldn't see from the data alone.
The following are additions that were made taking into account comments and discussion:
In a comment, you already made reference to "severe testing" (Mayo and Spanos). Note that in their work you'll never find severity calculations that refer to misspecification tests, and for good reasons. Models can be violated in far too many and too complex ways in order to rule out all violations (or even just all relevant ones), and be it with a certain error probability.
There's TOST as in the response by Dave. This can work if we focus on one particular assumption (for example an autocorrelation parameter $\alpha$ to be zero) and take everything else in the model specification for granted. And even then we can only reject $|\alpha|>c$ for some $c>0$ (how small $c$ can be will depend on the sample size); we cannot reject $\alpha\neq 0$.
The original question was "how to choose the $H_0$", which I haven't really addressed up to now; instead of answering it, I will argue that we can't do much better than what is usually done. Remark 2 above is about an $H_0$ that isn't exactly the complement of the model assumption, rather rejecting it would secure (with the usual error probability) that the true $\alpha$ is close to zero, i.e., the model assumption. This is really the best we can hope for, and also it is not an accident that even this can only be achieved taking a host of other assumptions for granted. The thing is that we can never rule out too rich a class of distributions, because such a class will contain distributions that are so close (in case $\alpha\neq 0$) to the model assumption that they cannot be distinguished by any finite amount of data, or even distributions that are in terms of interpretation very different (like the "evil dependence structure" mentioned above), but can emulate perfectly whatever we observe, and can therefore not be rejected from the data. Famous early results in this vein are in Bahadur and Savage (1956) and Donoho (1988). Particularly there is no way to make sure that the underlying process has a density, let alone being normal or anything specific. (There is less work about evil dependence structures as far as I'm aware, because detecting them is outright hopeless.)
Furthermore, the problem with TOST is that I'd suspect that this has a higher probability to reject a true model than the standard misspecification testing approach, and this is bad, because not only it would be a (type II) error, but also it will worsen the problem that running model-based analysis conditionally on the "correct" outcome of a misspecification test can be biased, as the theory behind standard analyses doesn't take MS-testing into account, see the Shamsudheen and Hennig arxiv paper for this issue and some more literature.
References:
Bahadur, R. and L. Savage (1956). The nonexistence of certain statistical procedures in nonparametric problems. Annals of Mathematical Statistics 27, 1115–
1122.
Donoho, D. (1988). One-sided inference about functionals of a density. Annals of
Statistics 16, 1390–1420.
Spanos A (2018) Mis-specification testing in retrospect. Journal of Economic Surveys 32:541–577
There's also this (with which I agree more):
M. Iqbal Shamsudheen, Christian Hennig (2020) Should we test the model assumptions before running a model-based test? https://arxiv.org/abs/1908.02218