A common misinterpretation of a p-value is that it represents the probability of a false positive in the context of hypothesis testing. Here a "positive" means rejecting the null.
There are many ways to explain how this is a misinterpretation. Here is my favorite: For most two-tailed hypothesis testing applications, the probability that the null hypothesis is true is precisely 0 even before we consider any data because an effect size is never exactly 0. Thus the probability of a false positive is zero regardless of the p-value.
This explanation is interesting for far more than understanding p-values. It highlights the tension in trying to compare a set of measure 0 ($H_0: \beta = 0$) against [typically] everything else ($H_1: \beta \neq 0$). There are two basic approaches to fixing that asymmetry:
- Replace the alternative $H_1: \beta \neq 0$ with some single value, like $H_1: \beta = \beta^*$. Thus we weigh evidence for two specific values of $\beta$, specifically $0$ vs $\beta^*$.
- Define both $H_0$ and $H_1$ in terms of sets with strictly positive measure. For example, set $H_0: \beta < 0$ and $H_1: \beta > 0$.
With (1), a standard approach is to apply a minimum Bayes factor (MBF), essentially a kind of upper bound on the relative posterior density (specifically the odds ratio) of any particular alternative parameter value (such as the MLE) versus the null value. A much-cited result is that the probability of $H_0$ estimated this way can easily be substantially greater than the p-value. The typical framing of this result is that using an MBF is somehow better than thinking about a p-value.
But let's also consider a specific flavor of (2), which I'll call directional correctness. Specifically, suppose the hypothesis we're actually interested in is a function of the data: $H_0$ is the event that the true value of $\beta$ is NOT of the same sign as our estimator $\hat \beta$, and $H_1$ is the complement of $H_0$. Basically, if I have an estimated effect, the next thing I often want to know is "how certain can I be that the estimate is not so far off from the truth that it at least has the same sign as the truth?".
Evaluating directional correctness aligns conceptually with looking at a posterior distribution and measuring the smaller tail (w.r.t. 0 or any other center of interest). And in the special cases where a credible interval coincides with a standard confidence interval, the p-value is precisely that tail probability.
I'm tempted to conclude that for purposes of evaluating directional correctness of effects, plain old p-values are a more reasonable place to start than getting into MBFs. What is the weakest link in this line of reasoning?