0

I'm finding an issue while adjusting p-values with Holm's method I cannot understand.

It's not a problem in my actual data, but I would appreciate any help to understand the issue.

To show the question I have created some p-values from 0.04 to very low values, then compute bonferoni's and Holm's adjusted p-values

myp <- data.frame("unadjusted" = 0.04 / c(1, 5, 10, 50, 100, 500, 1000)) # dummy p-values
myp$bonferroni <- p.adjust(myp$unadjusted, method = "bonferroni") # Bonferroni's method
myp$holm <- p.adjust(myp$unadjusted, method = "holm") # Holm's one
myp$holmreject0.05 <- ifelse(myp$holm < 0.05, "yes", "no") # test if adjusted p-value is below alpha threshold
format(myp, scientific = FALSE)

  unadjusted bonferroni    holm holmreject0.05
1    0.04000    0.28000 0.04000            yes   
2    0.00800    0.05600 0.01600            yes
3    0.00400    0.02800 0.01200            yes
4    0.00080    0.00560 0.00320            yes
5    0.00040    0.00280 0.00200            yes
6    0.00008    0.00056 0.00048            yes
7    0.00004    0.00028 0.00028            yes

Bonferroni is easy to understand: 7 comparisons, p-value multiplied by 7. About Holm I have got quite surprised noticing the first p-value wasn't modified, reading the Wikipedia I understand that Holm's method more strongly modifies the first (lowest) p-value, and then goes ahead till the last (highest) one, which is "less penalized" (step-down method). With this dummy p-values, they have been multiplied by 7 (the lowest) to 1 (the highest).

In this question, focused at a different question, an user posted a table, and their higher p value was not modified either (sixth file). I was also multiplied by 1.

Going further, if I go to Wikipedia example on the method and use their data, the highest p-value is in fact modified:

WikipediaExample <- data.frame("unadjusted" = c(0.01, 0.04, 0.03, 0.005)) # unadjusted p-values
WikipediaExample$adjusted <- p.adjust(WikipediaExample$unadjusted, method = "holm") # Holm's adjusted
WikipediaExample$reject0.05 <- ifelse(WikipediaExample$adjusted < 0.05, "yes", "no") # over/below alpha threshold 0.05
WikipediaExample

  unadjusted adjusted reject0.05
1      0.010     0.03        yes
2      0.040     0.06         no  # highest p-value, multiplied by 1.5
3      0.030     0.06         no
4      0.005     0.02        yes

I cannot understand how is it possible that in Wikipedia example, a p = 0.04 is over 0.05 threshold with 4 comparisons, but in the dummy data I created, with 7 comparisons a p = 0.04 is below 0.05 threshold. I expected the more comparisons, the more penalized p...

Thanks in advance.


Note: my maths level (I'm a physician) is not enough to read the formula for adjusting p-values. I can understand and check with pencil and paper the first part of Wikipedia entry on rationale and formulation

1 Answers1

1

One way to think of this: Suppose you have a bunch of tests that all have $p<0.05$. The multiple comparisons make it easier to get one $p$-value less than $0.05$, but make it harder to get all the $p$-values less than 0.05. No matter what the correlation between these tests, that's going to happen less than 5% of the time under the null.

So, the largest $p$-value should be adjusted to less than 0.05, and in fact ends up not changing.

Thomas Lumley
  • 21,784
  • 1
  • 22
  • 73
  • I think I get the point: if we have a lot of comparisons, the last ones (the largest) will be less impacted (divided by a number closer to 1). In the wikipedia example, as few comparisons were made the last ones are still quite impacted. – Miguel Menéndez Jun 05 '20 at 14:17
  • 1
    Yes, exactly. The largest one isn't (obviously!) being selected for being small, so it doesn't have to be penalised for selection. – Thomas Lumley Jun 05 '20 at 22:41