Do I have to correct for multiplicity twice if I re-run a comparison with one more hypothesis during incremental testing?

Question

Let's say I have a set of hypotheses ordered in time. These are about comparison to some baseline moment: $H_{t1 vs. baseline}$, $H_{t2 vs. baseline}$, $H_{t3 vs. baseline}$, ...

I have 10 time points, thus, 10 hypotheses. I want to find the first consecutive three statistically significant ones. Something like this:

H1 - not sign
H2 - sign  *** THIS
H3 - sign  *** THIS
H4 - sign  *** THIS
H5 - not sign
H6 - not sign
H7 - sign
H8 - sign
H9 - sign
H10- sign

There are various multiple comparisons methods, e.g. Dunnett. But I want to stop after the first occurrence. I don't care of the subsequent ones. So, I don't want to "penalize" my significance level too much from the start.

I want to start with the first set (H1-H3), check it at $\alpha=0.05/3$, then, if not found expand it by the next hypothesis (H1-H4), check it at $\alpha=0.05/4$ , and continue until success or full set (H1-H10) examined.

So, at each run, the FWER is controlled. But what about the overall FWER? I make a decision on going to the next step based on the previous one ("Didn't find").

Do I need also to adjust each step?

I mean:

Step 1: H1 - H3, α=0.05/3   (3 comparisons)
Step 2: H1 - H4, α=0.05/4 and also ÷2, as this is 2nd run, so it gives α=0.05/8
Step 3: H1 - H5, α=0.05/5 and also ÷3, as this is 3rd run, so it gives α=0.05/15

This allows me to find the difference earlier, but at each step, the penalty gets horribly large ($\alpha=0.05/15$ at 3rd run), testing only 5 hypotheses (and the first 3 are already non-rejected!).

Should I adjust twice or not, in your opinion?

FWER's rejection probabilities change if you change the "family" (e.g., if you added 1,000 *p* values from tests of true null hypotheses, your original rejections would almost certainly change to fail to reject). By contrast, [FDR](https://en.wikipedia.org/wiki/False_discovery_rate) methods scale across numbers of comparisons, so if you added 1,000 new tests, you simply come up with a more reliable estimate of the false discovery rate, and your rejection probabilities are conserved. tl;dr: With FWER if you change the family of tests all bets are off; use FDR instead. — Alexis, Feb 27 '20 at 19:35
@Alexis, this is interesting topic. What if the FWER is requested? For example by a reviewer or regulator. For FDR there are methods like online testing method (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6792083/) but here the author seems to have a requirement on FWER. Of course, adding hypothesis changes everything, so the author proposes subsequent adjustments, the question is should the significance be adjusted not only within a run (he/she uses Bonferroni), but also between the runs. Interesting problem. — Damasco, Feb 27 '20 at 23:19
FWER is a tad conceptually incoherent (e.g., it presumes *all* hypotheses are null, even in the face of mounting evidence to the contrary). Lots of folks use outmoded or inappropriate methods in application. Personally, I only teach FWER methods to illustrate the history of growth in statistical ideas, and prefer FDR always. — Alexis, Feb 28 '20 at 00:24
You are right. But in certain industries, like medicine, pharma, the assumption on all non-existent effects is a common standard, due to conservativeness required by the regulators, regardless of if we like it or not. I have to use FWER all the time, with no option to use FDR. It will be questioned and rejected. Let me only mention, that I am usually not allowed to use any other method than Bonferroni. So, do you have any advice on the case reported by the author? Is the double correction necessary, per each run, or is keeping the FWER just within each run sufficient? — Damasco, Feb 28 '20 at 00:36
@Alexis a note about FWER and the regulators. https://phastar.com/blog/157-multiplicity — Damasco, Feb 28 '20 at 17:12
@Damasco I do not deny that some folks require FWER (or that those folks may be regulators), but the author's question is an example of why FWER—specifically its concept of "family"—is, in technical terms, *some bullshit*. That said, why are you not permitting [Holm's FWER method](https://pdfs.semanticscholar.org/7f0a/29a89655d7998efc7bb53e695b3b950bf7fd.pdf)? Asking if I have any advice about the "choice" available in "Choose one of the following methods: (a) Bonferroni." seems like an odd question. If your hands are tied and you have no choice, then live with incoherence and be done with it. — Alexis, Feb 28 '20 at 17:20
@Alexis Sure, I thought you may find this interesting :) I agree with you. My question about proposals was with regard to the main topic we are commenting now, where the author corrects (with Bonferroni, as far as I can see) the comparisons per "run", but also wonders, if additional correction is necessary between the incremental "runs". — Damasco, Feb 28 '20 at 17:58
@Damasco Ah! So sorry. I misunderstood what you were asking. But I would say, see my original comment: different "runs" = different "families", right? — Alexis, Feb 28 '20 at 20:09

Do I have to correct for multiplicity twice if I re-run a comparison with one more hypothesis during incremental testing?

0 Answers0