In general, when performing posterior predictive checks, one calculates a posterior predictive p-value like so: $$p_B = \frac{1}{S}\sum_{s=1}^{S}\mathbb{1} (T(x^{(rep,s)},\theta^{s}) \ge T(x,\theta^{s}))$$ for some test quantity $T(x,\theta)$ and replicate datasets $x^{rep,s}$.
I understand that this method is only used for probing the model for areas of weakness and that using the data twice is a necessary evil. See here for more details. That's not my problem however.
The whole idea is that we can make inferences on our model based on the outcome (ideally $p_B \approx$ 0.5) which depends on our test quantity/statistic. And while we shouldn't select a $T(X)$ that is a sufficient statistic for the model (like the mean or variance for exponential families), who is to say that the $T(X)$ we select is any good at telling us anything useful about the model?
For example, say my test statistic is $Min(X)$. A poor test statistic can only tell me that my model does not capture this aspect of the data well, based on the prior and likelihood that I set up. Then I could tweak my likelihood to somehow result in a better posterior predictive check. But then I'd be potentially overfitting. And what's more, overfitting to some test statistic that I think is a good one.
It seems to me to be a pretty complex and convoluted approach to model diagnostics, compared to cross validation and sensitivity analysis. Outside of academia and tightly controlled settings with well specified Bayesian models, do practitioners actually use posterior predictive checks? Anyone with real world use cases where this approach was helpful? I'm happy to use them in daily analysis because I strongly believe in model checking but this seems like a hassle for little insight gained.