3

A question: I have a study with 3 experiments, each involving a between-subjects comparison of a treatment vs. control group. The groups in each experiment are independent samples and no groups are repeated across experiments. A t-test is run for each experiment.

A reviewer is asking for correction for multiple comparisons. It seems odd to perform such a correction across experiments, especially since we only ran one t-test per experiment. However, in all 3 experiments we are interested in whether there is an effect of the treatment (albeit under different conditions in each case). Is Bonferroni the appropriate method here? What other methods might be preferable? Are there citeable works on whether it is appropriate to do such corrections?

Note: a somewhat similar question was asked here: correcting for multiple comparisons with independent groups

Thanks for any advice!

OliverFishCode
  • 414
  • 6
  • 18
Altair555
  • 61
  • 2
  • Have a look at this answer [here](https://stats.stackexchange.com/questions/225937/linear-mixed-effects-model-and-multiplicity-issue-and-adjusting-for-p-values/226215#226215) It also provides some references that may help you in your specific situation. In general, the appropriate method will depend on your experiment and what you want to control for. – Stefan Mar 17 '18 at 13:50

2 Answers2

1

A Bonferroni correction is conservative, but always an option, because the Bonferroni inequality requires no assumptions about the relationships among the events. It exerts the strongest control over the simultaneous error probability, $\Pr(\text{at least one type I error})$, unconditionally, and may be used also for constructing (conservatively) a set of confidence intervals whose simultaneous confidence level exceeds the specified one. For just 3 tests, it usually isn’t all that conservative, and so it has much to recommend it when this strong type of control over the error rate is desired.

You could also use a slightly less conservative adjustment and run each test at an individual significance level of $1-(1-\alpha)^{1/3}$ (which is about 0.017 with $\alpha=0.05$). This is based on setting the joint probability of 3 independent events to the desired value, and it is justified because you have independence here. Again, this would control the overall error probability at the simultaneous CI level.

Some other simultaneous-testing procedures gain power by avoiding testing certain comparisons at all when other comparisons are already deemed non-significant. There are various levels of control over the error rate with such procedures. Methods that control the false-discovery rate are at the weakest level of control (other than no correction at all). It is unclear how one might apply such methods to the situation described in the OP. But I suppose one possibility is to fit a model with a block effect for the three experiments, and a treatment factor that has 4 (or 6?) levels: control, treat1--treat3 (or perhaps 3 levels of control?). A Dunnett-style test could be used to test the comparisons of interest. The Dunnett method has the same strong control over error rate as the Bonferroni, and be slightly more powerful than Bonferroni -- I suspect not that much. An FDR method could be applied using something like the R function p.adjust() with the unadjusted $P$ values.

A good reference for all the issues under discussion is Chapter 5 of Oehlert's experimental design text, which is openly available by the Creative Commons license: http://users.stat.umn.edu/~gary/book/fcdae.pdf

Russ Lenth
  • 15,161
  • 20
  • 53
  • The Bonferroni adjustment is always wrong in that it (a) always assumes all null hypotheses are true, even after some have been rejected and hemorrhages statistical power thereby, and (b) rejection probabilities under it are conditional on an undefined and loosey-goosey concept of "family of tests," and change when the number of tests change. The method is 55 years old, and has been supplanted by control of the [false discovery rate](https://en.wikipedia.org/wiki/False_discovery_rate) (FDR) which suffers neither of these problems. – Alexis Mar 17 '18 at 05:53
  • There are different types of control over the simultaneous error rate. See for example Chapter 5 of Gary Oehlert’s book, http://users.stat.umn.edu/~gary/book/fcdae.pdf. FDR methods establish only weak control. – Russ Lenth Mar 17 '18 at 14:04
  • @Alexis Can you explain how you conclude that the Bonferroni adjustment is always wrong (maybe you want to elaborate in a separate answer?) It's a very strong statement. Is it your personal opinion or a claim that has been established in the literature? I am just curious. I was always under the impression that there is no perfect test that can be applied across all problems you may encounter. It all depends on your data, your hypotheses, and other assumptions specific to your experiment. – Stefan Mar 17 '18 at 14:05
  • @Stefan the Bonferroni adjustment is not a test: it's a method of **either** adjusting *p*-values, **or** adjusting the rejection threshold under multiple testing. The two claims I specifically articulated are backed up in the literature (some of which is cited in the link I provided), but here's a solid history of the FDR: Benjamini, Y. (2010). [Discovering the false discovery rate](https://pdfs.semanticscholar.org/2d8b/c065924c4729e15b565eca0e7b06a8077b2c.pdf). *Journal of the Royal Statistical Society Series B (Statistical Methodology)*, 72(4):405–416. – Alexis Mar 17 '18 at 16:43
  • @Stefan As a method of adjustment it (1) unnecessarily hemorrhages statistical power (even for a [FWER](https://en.wikipedia.org/wiki/Family-wise_error_rate) method see [Holm's methods](https://pdfs.semanticscholar.org/7f0a/29a89655d7998efc7bb53e695b3b950bf7fd.pdf)), and (2) "family of tests" has no formal meaning, but the results on rejection probabilities depend on the choice of family. – Alexis Mar 17 '18 at 16:47
  • @Alexis I understand what you are saying. That was not my problem. I certainly agree that FDA methods are perfectly fine to address the issues surrounding multiple comparisons. However it still doesn't support your claim that the Bonferroni adjustment is **always** wrong. – Stefan Mar 17 '18 at 17:01
  • @Stefan Can you falsify my claim? (Other than in some "A broken clock is right twice a day" fashion?) – Alexis Mar 17 '18 at 18:04
  • @Alexis Why should I? I asked you for references to support you claim that the Bonferroni adjustment is **always** wrong. So it's up to you to either do it or leave it. And so far, you provided a link with references and a single paper to support that claim. In my opinion this isn't sufficient and if this claim was made in a manuscript, it would almost certainly not pass the peer-review process. – Stefan Mar 17 '18 at 18:17
  • 2
    @Alexis seems to be arguing that the FDR is always the rate one should control. But I don’t think many statisticians agree with that. For one thing, sometimes people want simultaneous confidence intervals and the FDR doesn’t work for that. By the way, most popular FDR methods actually use the Bonferroni inequality, but this is done stepwise with families of different sizes. – Russ Lenth Mar 17 '18 at 18:29
  • Note: just wanted to link to the following which details typical standards in the field for appropriate FDR values: https://stats.stackexchange.com/questions/252937/benjamini-hochberg-choosing-the-false-discovery-rate-q-value – Altair555 Mar 17 '18 at 18:36
  • I edited my answer, adding more details and changing "always correct" to "always an option". – Russ Lenth Mar 17 '18 at 19:22
  • @Altair555 The answer in the link your provided ignores the fact that FDR methods are step-down. The adjusted *p*-values are uninterpretable with respect to rejection decisions without also knowing the ordering of the unadjusted *p*-values. – Alexis Mar 17 '18 at 20:51
  • @Stefan I believe I have answered you. You are dissatisfied with my answers, which is fine. But perhaps you can help me understand what your standard of evidence you are looking for to evaluate my claim "The Bonferroni adjustment should **never** be preferred to control of the FDR"? – Alexis Mar 17 '18 at 21:08
  • I believe my revised answer gives an example of such a scenario. – Russ Lenth Mar 17 '18 at 22:02
  • rvl, "because the Bonferroni inequality requires no assumptions about the relationships among the events." is missing the dependent clause "even when evidence to the contrary accumulates." (I.e. suffers from my point #1) You also conflate stepping procedures with FDR: the Holm-Bonferroni and Holm-Šidak are both step-up, and both FWER). – Alexis Mar 18 '18 at 00:39
  • rvl, The Bonferroni adjustment also provides different rejection probabilities if, for example, a three more independent experiments were conducted, for a fourth groups to be compared to the first three: is the family 3 or 6 tests? Bonferroni's rejection probabilities *change* depending on the size of the "family" which is effectively arbitrary. This gets at my point #2. FDR does not have these issues, ergo: Bonferroni is never preferred over it. – Alexis Mar 18 '18 at 00:40
  • He only wanted to compare each treatment with control. That’s 3 comparisons. The rest of this is a retread of what you’ve said before. We don’t and won’t agree. – Russ Lenth Mar 18 '18 at 03:44
  • @Alexis (1) It's interesting to see how your claim that the "Bonferroni adjustment is always wrong" changed to "The Bonferroni adjustment should never be preferred to control of the FDR". This makes a meaningful discussion quite difficult. (2) I read the link and the paper and **nowhere** the authors concluded what you are claiming here (whether you first or now second claim). Instead the authors developed a methodology to help with the problem of multiple comparisons. So providing those links is certainly useful to learn about FDR but not useful as evidence to support your claim. – Stefan Mar 18 '18 at 14:38
  • To me rvl, provided a thoughtful answer and also suggest solutions to the problems @Altair555 is facing. – Stefan Mar 18 '18 at 14:58
  • @Stefan I do not see the claim as having changed, but having been clarified. Given that methods exist that share neither of the Bonferroni adjustment's two major shortcomings, it is never correct to use it. But if you do not wish to articulate or clarify your desired standard of evidence, by all means go ahead and keep advocating for outdated methods that have been surpassed. – Alexis Mar 18 '18 at 16:04
  • 1
    @Alexis you are obviously missing the point. You made a pretty strong claim in your **first** comment. I asked for references to support this. You provide references that explain the FDR method but none of these references actually suggesting to never use Bonferroni. It seems your claim is opinion-based, which is fine, but then it should be labeled as such. You don't seem to be reading the comments because nowhere I am advocating for anything. If you think you can help the OP with his/her problem, I'd suggest you post an answer since answers on CV can also be cited! – Stefan Mar 18 '18 at 18:35
  • @Stefan I have been justifying my **first** comment the entire thread of this conversation. You have not articulated your standard of evidence by which you would like my first point justified. I think we're done here. – Alexis Mar 18 '18 at 18:52
  • @Alexis thanks for taking out the part from your last comment that I'm not being able to comprehend your argumentation. Much appreciated! – Stefan Mar 18 '18 at 19:38
0

Dunnetts multicomparrison test is used for multiple comparisons when there is a control

For reference: http://www.statisticshowto.com/dunnetts-test/

OliverFishCode
  • 414
  • 6
  • 18
  • 1
    I disagree. Dunnett’s method applies when you have one control and several other treatments in ONE experiment. – Russ Lenth Mar 17 '18 at 02:39
  • It sounds like they should have done an anova based on reveiwer comment . Almost as if each experiment is really a different factor or factor level. So I treated it like posthoc tests. If they strongly disagree with the reveiwer and they truly are independent experiments they should write a rebuttal. – OliverFishCode Mar 17 '18 at 02:48
  • 1
    Thanks for the thoughts. An ANOVA across all experiments might be worth considering. We initially already sidestepped the issue in our first invited revision, but this is the second round, and the reviewer is more insistent this time (yet rather vague, only stating correcting for multiple comparisons). – Altair555 Mar 17 '18 at 06:42
  • I can't comment on the other answers, but here is a discussion of bonferoni from another question: https://stats.stackexchange.com/questions/120362/whats-wrong-with-bonferroni-adjustments – OliverFishCode Mar 17 '18 at 14:07