2

I try to get a sense of the correct adjusted Holm-Bonferroni significance levels $\alpha$ when dealing with multiple interaction terms. To simplify, assume one has the following model: $$ y=\beta_0 + \beta_1x_1 + \beta_2x_2+\beta_3x_3+\beta_4x_1x_3+\beta_5x_2x_3 + \varepsilon $$

I would publish any of the following statistical findings (Theories T):

T1a: $\beta_1\ne0$ and $\beta_4=0$

T1b: $\beta_1\ne0$ and $\beta_4\ne0$

T2a: $\beta_2\ne0$ and $\beta_5=0$

T2b: $\beta_2\ne0$ and $\beta_5\ne0$

The crucial point is that I would not publish $\beta_1=0$ and $\beta_4\ne0$ and would also not publish $\beta_2=0$ and $\beta_5\ne0$. Thus, I test $\beta_4$ and $\beta_5$ only conditional on $\beta_1$ and $\beta_2$, respectively, being significant.

Let's assume now, we have the following p-values: 0.01 for $\beta_1$, 0.02 for $\beta_2$, 0.04 for $\beta_4$ and 0.05 for $\beta_5$. We test for the overall significance level $\alpha=0.05$.

Using the Holm-Bonferroni approach, I would assume that the significance level for $\beta_1$ is $\alpha/2=0.25$. That is because the interaction terms $\beta_4$ and $\beta_5$ are never tested before $\beta_1$ and $\beta_2$, respectively, are significant. For $\beta_2$, I would put the significance level at $\alpha/2=0.25$. That is because I am not testing $\beta_5$ before $\beta_2$ and we already dealt with $\beta_1$. That means at this point, I only considered $\beta_2$ and $\beta_4$. For $\beta_4$, I would put the significance level at $\alpha/2=0.25$. That is because I already dealt with $\beta_1$ and $\beta_2$ - that is now identical to the standard Holm-Bonferroni test. Lastly for $\beta_5$, the significance level is $\alpha/1=0.05$ as usual.

Thus in this example, I would accept T1 and T3 and reject T3 and T4 using the Holm-Bonferroni correction. Is my dealing with the interaction effects $\beta_4$ and $\beta_5$ correct?

NOTE 1: In my actual problem, I have two separate models $y=\beta_0 + \beta_1x_1 +\beta_3x_3+\beta_4x_1x_3 + \varepsilon $ and $y=\beta_0 + \beta_2x_2+\beta_3x_3+\beta_5x_2x_3 + \varepsilon $ fitted with two different datasets. But I don't think this makes any difference in this case.

NOTE 2: The question How to apply Bonferroni correction when including an interaction term? has some similarities to this one, but I am not sure whether the answers also apply to my question.

Tom Pape
  • 607
  • 3
  • 12

1 Answers1

4

So you have the following situation:

Test model 1 on data 1, if significant -> additionally test model 3 on data 1
Test model 2 on data 2, if significant -> additionally test model 4 on data 2

Because these two approaches use different data sets these are two different processes.
The answer will also depend on your objective / primary hypotheses.

If you decide to test both models (say model 1 and 3) regardless of results of model 1, then you would adjust all the p-values from both models together.
If, however, you decide to conduct additional tests based on previous results (if they are significant), this is generally what people call "exploratory analysis" and different rules apply here. You will find different answers here, but in general, you would not adjust the p-values from the second model, since these results would be only "exploratory" and not "confirmatory", but you would still adjust the p-values from the first model.

Also, I am not sure what you mean by $\alpha/2$, if you are using Bonferroni then you divide by $n$ the number of conducted tests not the number of models.

user2974951
  • 5,700
  • 2
  • 14
  • 27
  • Thank you for your detailed answer. Yes, your understanding of the question is correct. – Tom Pape Jan 21 '19 at 15:09
  • Note that the number of models is the number of tests in my example (I will clarify this now in the question by calling it competing hypotheses). I am in the process of submitting a pre-registered trial with the above mentioned four hypotheses, what appears to me a confirmative design. In my project Model 1 tests for a cognitive bias and model 3 tests for a way to partially unbias. Clearly, without the bias one does not need to test whether the method for unbiasing works. Thus, would you now say that my reasoning makes sense or what would be the right adjusted significance levels? – Tom Pape Jan 21 '19 at 15:12
  • @TomPape I don't really follow, I don't know how you are testing for bias with a hypothesis test. Also I don't know what you mean by models "competing". – user2974951 Jan 22 '19 at 13:38
  • Thank you for your patience with me. I still need to learn how to explain statistical questions cleanly. I have now completely rewritten the question. Does this make things clearer for you? If not, please just let me know what causes confusion. – Tom Pape Jan 23 '19 at 10:10
  • 1
    @TomPape I think you are overcomplicating things, it is not clear what you are trying to show with this analysis. Why not test both coefficients at the same time as opposed to doing sequential tests? If you are testing whether a coefficient is different or not from 0 you can do this all in one go. – user2974951 Jan 25 '19 at 13:53
  • I guess what I am trying to do is to “tighten” the Holm-Bonferroni rule for this sequential case. But I guess you are right that this isn’t straight forward to do. I just had a look at the two proof options for the Holm-Bonferroni method (induction, closed-testing procedure), but wouldn’t know how to modify this to my special case. Thus perhaps this is just a case of too complicated and better give up and do it the standard way. – Tom Pape Jan 25 '19 at 16:09
  • 2
    @TomPape It is general advice to try and put all your hypotheses into one model, that is define them all beforehand, instead of doing a "if then" kind of nested procedure. The method that you proposed is sure to raise eyebrows and if you are trying to write this for a paper then any good statistical reviewer would question this choice. – user2974951 Jan 27 '19 at 08:29
  • "It is general advice to try and put all your hypotheses into one model". It think you are right on that. And if I am honest, if I read a paper and the p-value of its main conclusion just happened to be a tine bit below the threshold needed (which was in turn derived also by a complicated statistical 'streghtening' approach which almost no reader can follow), I would also start wondering! And a p-value of 0.0125 for the main conclusions shouldn't be too tall an order, even for an experiment. Thus thanks a lot for your last comment, which I found really helpful :) – Tom Pape Jan 28 '19 at 07:56