Official name of a common type of Bayesian simulation study

Question

There is a kind of simulation study that is commonly used to validate an implementation of a Bayesian model:

For independent replication $i = 1, ..., n$:
1. Draw a set of "true" parameters parameters from the joint prior.
2. Draw a dataset from the likelihood given the parameter draws from (1).
3. Approximate the full joint posterior distribution, e.g. with MCMC or variational inference.
4. For each parameter (index $p$) let $c_{ip}$ = 1 if the $100(1 - \alpha)$% posterior interval covers the prior predictive draw from (1). Otherwise, $c_{ip}$ = 0.
For each parameter $p$, calculate coverage: $C_p = \frac{1}{n} \sum_{i = i}^n c_{ip}$. If $C_p < 1 - \alpha$, then there are problems in the model or the software.

This technique is super useful in my team's work, and it has caught a lot of errors. Does anyone know if it has an official name? I have been searching but have been unable to find it. At first I thought it was called "simulation-based calibration", but what I am describing does (4) above instead of the calibration part.

References

Andrew Gelman, Aki Vehtari, Daniel Simpson, Charles C. Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, & Martin Modrák. (2020). Bayesian Workflow. https://arxiv.org/abs/2011.01808
Cook, Samantha R., Andrew Gelman, and Donald B. Rubin. 2006. “Validation of Software for Bayesian Models Using Posterior Quantiles.” Journal of Computational and Graphical Statistics 15 (3): 675–92. http://www.jstor.org/stable/27594203.
Talts, Sean, Michael Betancourt, Daniel Simpson, Aki Vehtari, and Andrew Gelman. 2020. “Validating Bayesian Inference Algorithms with Simulation-Based Calibration.” http://arxiv.org/abs/1804.06788.

I often do when feasible, but this particular simulation does not use the posterior predictive distribution (only the marginal posterior of each parameter). “Posterior predictive checks” and “posterior checks” sound a bit too general for this. — landau, Mar 09 '21 at 23:57
Also, I would like to find the name that is already widely used in the community, rather than try to invent a name myself — landau, Mar 09 '21 at 23:59
Whilst I have heard of people investigating the frequentist properties of credible regions, CRs offer no coverage guarantees. And so I can’t see why you’d conclude a bug in software or model if it didn’t have a particular coverage. — innisfree, Mar 10 '21 at 15:43
@innisfree I see your point about frequentism. However, it isn't all that different from actual SBC. http://www.jstor.org/stable/27594203 section 2 paragraph 1 explicitly claims their quantile method generalizes what I described, and SBC in https://arxiv.org/abs/2011.01808 generalizes further. All 3 approaches take independent draws from the prior predictive distribution and approximate the posterior for each prior predictive draw. And all 3 approaches compare posterior quantiles to prior predictive draws from simulations. — landau, Mar 11 '21 at 11:11
No. What they describe in Sec. 2, is more like an average coverage. Averaged over the prior. Which agrees with the amount of probability in the CR — innisfree, Mar 11 '21 at 15:30
It does not say that CR has correct coverage for every choice of possible parameter. — innisfree, Mar 11 '21 at 15:31
Proof of average property: just write the joint as $p(x,D|M) = p(x|D,M) p(D|M)$. This is distribution from which we’re sampling true parameters and data. Then clear that in every simulation, for whatever $D$ you draw, since $p(x|D,M)$ is the posterior, there’s a X% chance you get a draw that lies the X% CR — innisfree, Mar 11 '21 at 15:34
Actually, re-reading, I now find your question ambiguous: do you test for correct average coverage? Or correct coverage for each possible true value? — innisfree, Mar 11 '21 at 15:37
I calculate coverage as an average over all prior predictive draws. For any individual prior predictive draw on its own, all I calculate is hit or miss. In other words, I was thinking of coverage itself as an average over the prior. (If I understand correctly, the issue you just raised seems like what section 2 of https://arxiv.org/pdf/1804.06788.pdf is talking about.) — landau, Mar 11 '21 at 15:44
Thinking out loud: if a rank statistic in SBC is itself a kind of coverage, the aggregate does seem to be a kind of "average coverage". — landau, Mar 11 '21 at 15:54

Official name of a common type of Bayesian simulation study

References

0 Answers0

Linked