Test for significant excess of significant p-values across multiple comparisons

Question

I have what feels like a simple question, but was unable to find answers easily.

The situation

Let's say I have a gene microarray dataset with tens of thousands of genes and small (<100) number of samples. I am interested in simple mean differences between two sample groups. I do a t-test for each gene and get p-values. But none of them survive after the Bonferroni correction for multiple testing.

However I also see that there are 8% significant genes which I think is above chance. So instead I would like to claim that there are more significant genes then expected.

The problem

It feels like I cannot simply state that I expect 5% and 8% is above that so I have more. Because the genes are most likely not independent. Maybe it's not unlikely to get 8 percent and more.

So instead what I tried to do is permute the sample labels and see what fraction of permutations gives me 8% or more genes with significant differences. And if I see that only 1 percent of permutations gave me more than 8% of significant differences - then I state that there are more significant genes then expected and my permuted p-value is 0.01.

The questions

Is this a valid approach?
Are there better alternatives?
Maybe somebody knows any literature related to this problem?

Your procedure looks valid. A possible alternative would be to combine p-values for all genes, using e.g. Fisher's method. See [wikipedia](https://en.wikipedia.org/wiki/Fisher%27s_method) and also our [tag:combining-p-values] tag. — amoeba, Sep 09 '15 at 15:04
Thank you a lot for the answer and references. One final question if I may - is stating that 8% is above the expected 5% without any follow-up test (like those permutations) a big mistake? I think I've seen a few papers do that. — Karolis Koncevičius, Sep 09 '15 at 15:15
Your permutation test is not a "follow-up" test (in the sense in which this term is usually used, namely a test following another test), it is the main test that allows you to say that 8% is significantly above 5%. Without this test such a statement would look very unconvincing, because 8% is pretty close to 5%. (By the way, I am only writing comments in the hope that somebody else will provide a more in-depth answer.) — amoeba, Sep 09 '15 at 15:22
The site looks a bit empty, a lot of unanswered questions. So your comments are most appreciated, @amoeba . Thanks a lot! — Karolis Koncevičius, Sep 09 '15 at 15:25
@KarolisKoncevičius empty? hardly. it's full of questions waiting to be answered. — shadowtalker, Sep 10 '15 at 02:00
@amoeba I think your two comments together constitute an answer. — shadowtalker, Sep 10 '15 at 02:02

score 6 · Accepted Answer · answered Nov 15 '16 at 13:21

There are a number of methods for combining $p$-values which could be considered.

Birnbaum in his paper "Combining independent tests of significance" available here points out the problem is poorly specified. This may account for the number of methods available and their differing behaviour. The null hypothesis $H_0$ is well defined, that all $p_i$ have a uniform distribution on the unit interval. There are two classes of alternative hypothesis

$H_A$: all $p_i$ have the same (unknown) non--uniform, non--increasing density,

$H_B$: at least one $p_i$ has an (unknown) non--uniform, non--increasing density.

If all the tests being combined come from what are basically replicates then $H_A$ is appropriate whereas if they are of different kinds of test or different conditions then $H_B$ is appropriate. Note that Birnbaum specifically considers the possibility that the tests being combined may be very different for instance some tests of means, some of variances, and so on.

Of the methods with an eponym Fisher's method (sum of logs, sum of $\chi^2_2$) and Tippett's method (minimum $p$) respond well when the alternative is $H_B$ whereas Stouffer's method (sum of $z$s) and Edgington's method (sum of $p$) may be preferred when $H_A$ is the alternative of choice.

Loughin's extensive simulations "A systematic comparison of methods for combining $p$--values from independent tests" available here may also be of interest.

In the specific application you mention it depends whether you think just some of the genes are involved or all of them. Since my knowledge of genetics stops more or less with Mendel I leave that up to you.

Thanks a lot for this answer. A bit late to accept, but just now noticed it. My situation is $H_A$ and I went with permutations that time but this seems like it will be useful in the future. — Karolis Koncevičius, Jun 14 '18 at 18:15

score 2 · Answer 2 · edited Sep 09 '15 at 22:02

2

About 10 years ago Bradley Efron wrote a number of papers on the subject. I think in one of them he also used the permutation approach, but the main idea was to estimate the null distribution from the data parametrically. You can find the corresponding R package instructions here.

edited Sep 09 '15 at 22:02

amoeba

93,463
28
275
317

answered Sep 09 '15 at 18:42

James

2,600
1
14
26

Test for significant excess of significant p-values across multiple comparisons

The situation

The problem

The questions

2 Answers2

Linked