Correction for multiple comparisons

Question

I have a 2x3x4 repeated measured anova. I have a significant 3-way interaction, and I want to make sure that I am using the correct post hoc comparisons and not violating any key statistics theory.

I have run the statistics in SPSS and have adjusted for multiple comparisons using SIDAK but I want to make sure I understand how many adjustments are being made and if the p values are being corrected appropriately.

My post hoc tests have analysed the following:

AxB at each level of C

AxC at each level of B

BXC at each level of A

I am just trying to determine what correction factor is appropriate. For example b x c at each level of A compares 2 means 12 times. Here am I not adjusting because it is 2 means, or I am dividing the alpha value by 12 as there are 12 comparisons.

Similarly for a x c at each level of B I am comparing 3 means (as b has 3 levels) at each combination of a x c (8 combinations). Here would I be dividing the alpha value by 3 or by the combinations x the different means hence 24).

Similarly, for axb at each level of c am comparing 4 means (5 comparisons total) at each combination of a x b (6 total).Here would I be dividing the alpha value by 5 or by the combinations x the different means hence 30.

Alternatively would i be dividing the alpha value by the sum of all the above combinations?

Thanks

I think you'll get better answers if you give some scientific context to your data, and state what scientific questions you hope the interaction P values will answer. — Harvey Motulsky, Sep 03 '13 at 17:29
Thanks for this. What I am trying to determine is if factor A which is exposure 1 or 2, at each level of factory b which is intensity low, med and high elicit different responses over time (Factor C). I hope this helps — user28327, Sep 03 '13 at 18:23

score 2 · Answer 1 · answered Feb 02 '14 at 03:21

I'll be the bunny who points out that not every set of scientific questions requires a 'correction' for multiplicity of tests and not every approach to statistical inference involves keeping track of type I errors. (Prepares self for down-votes!)

You wish to be sure that you are "not violating any key statistics theory", but if you make adjustments to p-values (or critical thresholds) then you can be assured that you will be violating the likelihood principle in order to comply with the repeated sampling principle. If you wish to behave as a pure frequentist then comply with the repeated sampling principle at all costs: adjust away. However, if you wish to deal directly with the evidence in your data then you cannot be a pure frequentist because you have to comply with the likelihood principle.

If you are interested in the evidence then it is helpful to know that the evidence itself is unaffected by multiplicity of testing (and by stopping rules). Of course if you wish to make a decision on the basis of the evidence it is perfectly OK to take multiplicity into account.

For your particular application I would imagine that you might be able to answer your substantive question using a hierarchical model instead of a bunch of frequentist error rate-adjusted hypothesis tests. (Most Bayesian methods comply with the likelihood principle.) Here's an example that might help you see the bigger picture: http://www.stat.columbia.edu/~gelman/research/published/multiple2f.pdf

Has "evidence" been co-opted here to mean "likelihood"? If so, fair enough (cf "bias"), but worth clarifying. — Scortchi - Reinstate Monica, Feb 02 '14 at 13:04
@Scortchi Yes, likelihood does provide a sensible and defensible measure of evidence. However, I did not mean to 'co-opt' evidence in my answer. The properties that evidence naturally take (in my mind, at least. Not yours?) mean that it is not affected by multiplicity or by sequential sampling. The sometimes very indirect approach of frequentist ANOVA approaches to substantive questions coupled with the non-evidential nature of their results means that it is often a good idea to consider other approaches. — Michael Lew, Feb 02 '14 at 20:10
In my mind too actually (though I think it's OK to take more than the evidence into account even outside decision-making situations), but as "evidence" isn't a a standard statistical term, I thought it worth clarifying whether it was being used in stead of one or whether you were asserting something more. — Scortchi - Reinstate Monica, Feb 02 '14 at 21:41
@Scortchi Am I alone in thinking it quite bizarre that evidence isn't a standard statistical term? I suspect the reason involves a mistrust of the likelihood principle. If the likelihood principle is the only proper route to evidence then discarding it precludes a statistical notion of evidence. (This may be worthy of a question.) — Michael Lew, Feb 02 '14 at 22:39

score 1 · Answer 2 · answered Sep 03 '13 at 18:43

1

Fitting one linear model (this is performed with ANOVA) can be viewed as omnibus test. So you are checking whether at least one of the relationships is significant. If ANOVA gives you p>0.05, then all post-hoc tests are redundant and are non-significant regardless of their p-values. When ANOVA p-value is significant, then you may want to perform post-hoc tests. One of the good ways to do this - Tukey HSD test. As for your design - to perform Bonferroni adjustment for multiple comparison you should divide alpha-value (for all results from this design) by the number of total comparisons - it is the number of levels in A + the number of levels in B + the number of levels in C. This sum is 2+3+4=9 as I've understand from your description. Note, that in example "AxB at each level of C" you don't compare 2 means 12 times, but compare 12 means two times, so number of comparisons is 2. But since further you'll test next issues (AxC at each level of B and so on), you should not divide one alpha in one issue by two and in another by three and by four, but all divide by nine.

answered Sep 03 '13 at 18:43

O_Devinyak

2,118
17
14

Thanks for the response, I just wanted to follow up as I have used a SIDAK correction with SPSS which adjusts AxB at each level of C by 5 as there are 4 levels of C AxC at each level of B by 3 as there are 3 levels of B BXC at each level of A by 0 as there are only 2 levels I wonder why SPSS would adjust in such a way if it is incorrect. – user28327 Sep 03 '13 at 19:01
SPSS doesn't know that after one analysis (AxB conditional on C) you'll perform another. That is why you should find your alpha manually with formula for SIDAK correction and ```n```=9. – O_Devinyak Sep 04 '13 at 14:36
3

@O_Devinyak Your comment that ‘[i]f ANOVA gives you p>0.05, then all post-hoc tests are redundant and are non-significant regardless of their p-values’ is not true. Most post-hoc tests (including Tukey’s HSD) do *not* depend on the global test being significant. (And conditioning on the global test being significant actually changes the statistical properties of the post-hoc tests.) See http://stats.stackexchange.com/questions/9751/do-we-need-a-global-test-before-post-hoc-tests – Karl Ove Hufthammer Feb 01 '14 at 22:47
Now I see, I was wrong. Please, feel free to edit or remove my answer, since I don't know now whether it was more helpful or more misleading. – O_Devinyak Feb 04 '14 at 15:08

score 0 · Answer 3 · answered Sep 03 '13 at 18:16

0

As far as I know the p-value isn't divided by a correction factor, it's the Type I Error threshold value (alpha) which is adjusted by the number of comparisons being made.

answered Sep 03 '13 at 18:16

RobertF

4,380
6
29
46

Yes you are correct and in fact I meant multiply the p value by the number of comparisons or in fact divide alpha by the number of comparisons – user28327 Sep 03 '13 at 18:19

Correction for multiple comparisons

3 Answers3