0

I have N p values (N is around 200) one for each of N independent experiments in my study. These p values range from 0.03 to 0.7 (mean is 0.3), where only 5% of p values are less than 0.05. If I look at the fraction of experiments where the null hypothesis is rejected I get this 5%, meaning that for the majority of the study (95%) the null hypothesis was not rejected.

If I combine p values using different methods (Fisher's, Stouffer's ect.) the resulting p value is lower than 0.05, suggesting that the null hypotheses are not true for every experiment, but that is what I see from the p values when I obtained them... I would like to get a measure that will reflect a general trend in the study. Do you think that "rejection rate" is a valid way to report it? Or maybe you have any idea what is the appropriate measure for this?

Thank you!

M_S
  • 11
  • 1
    5% of p-values below 0.05 is what we would expect if the null hypothesis were true. Therefore, at first glance, it seems that you haven't found any significant effect. – Pere Jul 26 '17 at 09:34
  • 1
    And the relevant xkcd link: https://xkcd.com/882/ (probably, the most linked xkcd strip in Cross Validated). – Pere Jul 26 '17 at 09:35

1 Answers1

2

It is perfectly possible for the methods to combine $p$-values to give a value less than a certain criterion even if all the $p$-values being combined are above the criterion. Se this extended Can a meta-analysis of studies which are all "not statistically signficant" lead to a "significant" conclusion? Q&A.

What you are testing is the null hypothesis that they are drawn from a uniform distribution on the unit interval. Unfortunately there are two classes of alternative hypothesis (1) $H_A$: all $p_i$ have the same (unknown) non--uniform, non--increasing density, (2) $H_B$: at least one $p_i$ has an (unknown) non--uniform, non--increasing density. This may explain why there are so many candidate methods for combining the $p_i$.

There are some references about comparing the methods in What are good references for the different methods of combining p-values? Q&A.

mdewey
  • 16,541
  • 22
  • 30
  • 57