Is Bonferroni correction too anti-conservative/liberal for some dependent hypotheses?

Question

I frequently read that Bonferroni correction also works for dependent hypotheses. However, I don't think that is true and I have a counter example. Can somebody please tell me (a) where my mistake is or (b) whether I am correct on this.

Setting up the counter example

Assume we are testing two hypotheses. Let $H_{1}=0$ is the first hypothesis is false and $H_{1}=1$ otherwise. Define $H_{2}$ similarly. Let $p_{1},p_{2}$ be the p-values associated with the two hypotheses and let $[\![\cdot]\!]$ denote the indicator function for the set specified inside the brackets.

For fixed $\theta\in [0,1]$ define \begin{eqnarray*} P\left(p_{1},p_{2}|H_{1}=0,H_{2}=0\right) & = & \frac{1}{2\theta}[\![0\le p_{1}\le\theta]\!]+\frac{1}{2\theta}[\![0\le p_{2}\le\theta]\!]\\ P\left(p_{1},p_{2}|H_{1}=0,H_{2}=1\right) & = & P\left(p_{1},p_{2}|H_{1}=1,H_{2}=0\right)\\ & = & \frac{1}{\left(1-\theta\right)^{2}}[\![\theta\le p_{1}\le1]\!]\cdot[\![\theta\le p_{2}\le1]\!] \end{eqnarray*} which are obviously probability densities over $[0,1]^{2}$. Here is a plot of the two densities

enter image description here

Marginalization yields \begin{eqnarray*} P\left(p_{1}|H_{1}=0,H_{2}=0\right) & = & \frac{1}{2\theta}[\![0\le p_{1}\le\theta]\!]+\frac{1}{2}\\ P\left(p_{1}|H_{1}=0,H_{2}=1\right) & = & \frac{1}{\left(1-\theta\right)}[\![\theta\le p_{1}\le1]\!] \end{eqnarray*} and similarly for $p_{2}$.

Furthermore, let \begin{eqnarray*} P\left(H_{2}=0|H_{1}=0\right) & = & P\left(H_{1}=0|H_{2}=0\right)=\frac{2\theta}{1+\theta}\\ P\left(H_{2}=1|H_{1}=0\right) & = & P\left(H_{1}=1|H_{2}=0\right)=\frac{1-\theta}{1+\theta}. \end{eqnarray*} This implies that \begin{eqnarray*} P\left(p_{1}|H_{1}=0\right) & = & \sum_{h_{2}\in\{0,1\}}P\left(p_{1}|H_{1}=0,h_{2}\right)P\left(h_{2}|H_{1}=0\right)\\ & = & \frac{1}{2\theta}[\![0\le p_{1}\le\theta]\!]\frac{2\theta}{1+\theta}+\frac{1}{2}\frac{2\theta}{1+\theta}+\frac{1}{\left(1-\theta\right)}[\![\theta\le p_{1}\le1]\!]\frac{1-\theta}{1+\theta}\\ & = & \frac{1}{1+\theta}[\![0\le p_{1}\le\theta]\!]+\frac{\theta}{1+\theta}+\frac{1}{1+\theta}[\![\theta\le p_{1}\le1]\!]\\ & = & U\left[0,1\right] \end{eqnarray*} is uniform as required for p-values under the Null hypothesis. The same holds true for $p_{2}$ because of symmetry.

To get the joint distribution $P\left(H_{1},H_{2}\right)$ we compute

\begin{eqnarray*} P\left(H_{2}=0|H_{1}=0\right)P\left(H_{1}=0\right) & = & P\left(H_{1}=0|H_{2}=0\right)P\left(H_{2}=0\right)\\ \Leftrightarrow\frac{2\theta}{1+\theta}P\left(H_{1}=0\right) & = & \frac{2\theta}{1+\theta}P\left(H_{2}=0\right)\\ \Leftrightarrow P\left(H_{1}=0\right) & = & P\left(H_{2}=0\right):=q \end{eqnarray*} Therefore, the joint distribution is given by \begin{eqnarray*} P\left(H_{1},H_{2}\right) & = & \begin{array}{ccc} & H_{2}=0 & H_{2}=1\\ H_{1}=0 & \frac{2\theta}{1+\theta}q & \frac{1-\theta}{1+\theta}q\\ H_{1}=1 & \frac{1-\theta}{1+\theta}q & \frac{1+\theta-2q}{1+\theta} \end{array} \end{eqnarray*} which means that $0\le q\le\frac{1+\theta}{2}$.

Why it is a counter example

Now let $\theta=\frac{\alpha}{2}$ for the significance level $\alpha$ of interest. The probability to get at least one false positive with the corrected significance level $\frac{\alpha}{2}$ given that both hypotheses are false (i.e. $H_{i}=0$) is given by \begin{eqnarray*} P\left(\left(p_{1}\le\frac{\alpha}{2}\right)\vee\left(p_{2}\le\frac{\alpha}{2}\right)|H_{1}=0,H_{2}=0\right) & = & 1 \end{eqnarray*} because all values of $p_{1}$ and $p_{2}$ are lower than $\frac{\alpha}{2}$ given that $H_1=0$ and $H_2=0$ by construction. The Bonferroni correction, however, would claim that the FWER is less than $\alpha$.

The opposite of conservative is anticonservative in the statistical world! — AdamO, Jul 20 '16 at 23:40
see http://stats.stackexchange.com/questions/235856/can-bonferroni-be-applied-for-dependent-multiple-tests/236013#236013 — , Sep 21 '16 at 19:55
Thanks, but that's about something different. You need an additional assumption (dependence is not the problem, see my answer below). — fabee, Sep 23 '16 at 02:44

Bonferroni · Answer 1 · 2016-07-20T23:26:24.207

3

Bonferroni can't be liberal, regardless of dependence, if your p-values are computed correctly.

Let A be the event of Type I error in one test and let B be the event of Type I error in another test. The probability that A or B (or both) will occur is:

P(A or B) = P(A) + P(B) - P(A and B)

Because P(A and B) is a probability and thus can't be negative, there’s no possible way for that equation to produce a value higher than P(A) + P(B). The highest value the equation can produce is when P(A and B) = 0, i.e. when A and B are perfectly negatively dependent. In that case, you can fill in the equation as follows, assuming both nulls true and a Bonferroni-adjusted alpha level of .025:

P(A or B) = P(A) + P(B) - P(A and B) = .025 + .025 - 0 = .05

Under any other dependence structure, P(A and B) > 0, so the equation produces a value even smaller than .05. For example, under perfect positive dependence, P(A and B) = P(A), in which case you can fill in the equation as follows:

P(A or B) = P(A) + P(B) - P(A and B) = .025 + .025 - .025 = .025

Another example: under independence, P(A and B) = P(A)P(B). Hence:

P(A or B) = P(A) + P(B) - P(A and B) = .025 + .025 - .025*.025 = .0494

As you can see, if one event has a probability of .025 and another event also has a probability of .025, it’s impossible for the probability of “one or both” events to be greater than .05, because it’s impossible for P(A or B) to be greater than P(A) + P(B). Any claim to the contrary is logically nonsensical.

"But that's assuming both nulls are true," you might say. "What if the first null is true and the second is false?" In that case, B is impossible because you can't have a Type I error where the null hypothesis is false. Thus, P(B) = 0 and P(A and B) = 0. So let's fill in our general formula for the FWER of two tests:

P(A or B) = P(A) + P(B) - P(A and B) = .025 + 0 - 0 = .025

So once again the FWER is < .05. Note that dependence is irrelevant here because P(A and B) is always 0. Another possible scenario is that both nulls are false, but it should be obvious that the FWER would then be 0, and thus < .05.

edited Jul 20 '16 at 23:26

answered Jul 19 '16 at 14:47

Bonferroni

303
2
4

Thanks for the answer. I read derivations like yours many times and they make sense. However, I still don't see the mistake in my example. If it is nonsensical, where is my mistake? I have the feeling that the problem is that you take $P(A)$ to be $P(A|H_0^{1}=True)$, but for the FWER you are actually interested in $P(A\vee B|H_0^{(1)}=True\wedge H_0^{(2)}=True)$. You can still have $P(A|H_0^{(1)}=True)=\alpha$ but $P(A|H_0^{(1)}=True\wedge H_0^{(2)}=True)>\alpha$. This is what I constructed in my example. Your example is correct if the type I error is independent of the other hypothesis. – fabee Jul 20 '16 at 02:10
Computing the FWER assumes both nulls are true, so P(A) means the same thing as P(A|null 1 is true) and P(B) means the same thing as P(B|null 2 is true). Conditional probabilities are thus unnecessary. Maybe you should rewrite your example without them. Note that if "all values of p1 and p2 are lower than α/2 given that H1=0 and H2=0 by construction," then you've simply constructed a scenario in which the p-values aren't computed correctly. If each p is tested at α/2, each p must have an α/2 chance of significance by definition, yet you've apparently given each p 100% chance of significance. – Bonferroni Jul 20 '16 at 13:28
I don't think you are right. If FWER error rate assumes both nulls are true, then I want to compute P(A or B| null 1 and 2 are true). The decomposition you wrote in your answer therefore needs the *same* condition on the right hand side. Only when using conditional probabilities this becomes clear. My p-values are computed correctly because P(A|null 1 is true) is still $\alpha$ as it should. But note that P(A|null 1 is true) is generally not the same as P(A|null 1 and null 2 are true). – fabee Jul 20 '16 at 16:37
1

Draw a big square on a piece of paper representing the total sample space of possible outcomes. Then draw a circle that takes up 2.5% of the area of the square and label it A. Then draw another circle that takes up 2.5% of the area of the square and label it B. Make A and B overlap as little or as much as you want (i.e. play with the dependence between A and B). You'll find there's no way for the combined area of A and B to be more than 2.5%+2.5%=5%. – Bonferroni Jul 20 '16 at 17:55
1

It seems you're confused about probability on a very fundamental level and aren't ready to tackle the mathematics yet. We assume both nulls are true because that's the situation that produces the maximum FWER. If both nulls are false, there obviously can't be any Type I error at all. And if one null is true and one null is false, the error rate is simply whatever alpha level you use to test the true one. – Bonferroni Jul 20 '16 at 18:12
I think you didn't understand my reply. You are right: The decomposition is true regardless of the dependence of A and B. But this is only true if you are using the correct marginals from the joint distribution. Let CB="both nulls true" and "C1", "C2" null 1 true and null2 true. You are telling me P(A or B|CB) <= P(A|C1) + P(B|C2) while I say I cannot do that because the decomposition should be P(A or B|CB) <= P(A|CB) + P(B|CB). You cannot simply change the condition. That is simple algebra. Nothing to be confused about. – fabee Jul 20 '16 at 21:08
The probability of Type I error in a given hypothesis test doesn't depend on whether some other hypothesis is true. The probability of Type I error in a given test is--by definition--simply the alpha level at which you conduct that test (assuming the null is true for that test of course). One test doesn't know or care whether you are even planning to conduct any other tests at all. Thus, P(A|CB) = P(A|C1) and P(B|CB) = P(B|C2). So your preferred form, P(A or B|CB) <= P(A|CB) + P(B|CB), can be rewritten as P(A or B|CB) <= P(A|C1) + P(B|C2). You are making a distinction where there is none. – Bonferroni Jul 21 '16 at 00:34
It's not true in general that the joint distribution of two p-values (and therefore, type I errors) are independent for two different hypotheses. Just sample a few datasets from a normal distribution and test for the "mean=0" and "median=0" on each of them, and plot the p-values. Therefore, P(A|CB) = P(A|C1) must be an assumption of Bonferroni correction. Otherwise, my example above provides a valid example of joint distribution of p-values such that P(A|C1) = alpha, as it should, but P(A|CB) > alpha. – fabee Jul 21 '16 at 02:11
As long as you have legit p-values, Bonferroni requires no assumptions about dependency, number of true nulls, or anything else, as my answer clearly demonstrated. You say "It's not true in general that the joint distribution of two p-values (and therefore, type I errors) are independent for two different hypotheses." But just because the p-values of two tests can be dependent doesn't mean that the p-value for one hypothesis depends on whether the other hypothesis is true (that would be a non-sequitur!). Nor can the dependency alter either individual test's long-term error rate. – Bonferroni Jul 21 '16 at 02:59
Well, if you say so. I still challenge you to find the mistake in my example above. Your answer didn't clearly demonstrate what's wrong with it. That's what I am really interested in. – fabee Jul 21 '16 at 03:43
I don't think there's just one mistake. It's hard to figure out what you're even trying to do throughout the example. At one point, you refer to probabilities of nulls being true or false, e.g. P(H2=0|H1=0)--as if in a Bayesian framework, which makes no sense in this context. At the end you say that the FWER is 1, without offering any explanation except saying that both p values are significant under the null "by construction." Obviously, if your p-values are defined to always be significant under the null, they aren't legit p-values. My answer proves you wrong using elementary probability. – Bonferroni Jul 21 '16 at 05:35
see http://stats.stackexchange.com/questions/235856/can-bonferroni-be-applied-for-dependent-multiple-tests/236013#236013 – Sep 21 '16 at 19:55

score 0 · Answer 2 · answered Aug 15 '16 at 16:08

I think I finally have the answer. I need an additional requirement on the distribution of $P(p_1,p_2|H_1=0, H_2=0)$. Before, I only required that $P(p_1|H_1=0)$ is uniform between 0 and 1. In this case my example is correct and Bonferroni would be too liberal. However, if I additionally require the uniformity of $P(p_1|H_1=0, H_2=0)$ then it is easy to derive that Bonferroni can never be too conservative. My example violates this assumption. In more general terms, the assumption is that the distribution of all p-values given that all null hypotheses are true must have the form of a copula: Jointly they don't need to be uniform, but marginally they do.

Comment: If anyone can point me to a source where this assumption is clearly stated (textbook, paper), I'll accept this answer.

Is Bonferroni correction too anti-conservative/liberal for some dependent hypotheses?

2 Answers2