Methods for meta-analysis: z-transform (Stouffer's approach) vs. Fisher's method vs.... infinite others?

Question

Fisher's method and Stouffer's approach (z-transform) for meta-analysis roughly follow the same scheme:

take the p-values from some different experiments on the same hypotheses
apply to them some monotonic transformation
sum the transformed values together
check the result against some well-known distribution.

Now, my question is: what is the rationale for step 2? P-values are (under the null hypothesis) uniformly distributed over [0,1]. The sum of $n$ p-values hence already tends to a normal distribution (and the approximation is fairly good already for small values of $n$).

This is not just a theoretical curiosity. The p-values I am working with come from Mann-Whitney tests (but many other examples could be made). There is a strictly positive probability (under the null, but also under my H1) that the (one-tailed) p-value for a given test is 1. If this happens, the aggregated p-value according to the Z-transform method is also 1 (unless there was some test which yielded a p-value of 0 - let us assume this is not the case), simply because the normal distribution has $cdf(c)=1$ for $x \to + \infty$. And this is true whatever the number of p-values we are aggregating! To make an extreme example, if one p-value is 1, the other 100 are 0.1% and I know that a p-value of 1 has a 10% of probability to arise by chance, my intuition is that the null hypothesis should be rejected - instead the aggregated p-value is 1.

What is wrong with just summing up the p-values, and then comparing them to the appropriate normal (or Irwin–Hall, or even a discrete version of it, if it's a matter of accuracy) distribution?!

Even in cases in which the above problem does not arise, I fail to see why extreme p-values should be given higher importance: and if there is a reason, then how much importance should be given to them (i.e. how the specific transformations used at step 2 can be justified).

Very much related: [When combining p-values, why not just averaging?](http://stats.stackexchange.com/questions/78596) Perhaps this question can even be closed as a duplicate of that one (?); if so, I can transfer my answer there. — amoeba, Jul 10 '15 at 07:55
If my answer does get moved to that thread, then here is one remark that is specific to your particular question: the problem with combining p-values from Mann-Whitney tests that you are describing (if one of the p-values equals to 1 then the combined p-value will necessarily be 1) arises only with the Stouffer's method. Fisher's method does not suffer from this problem, so it seems that you can use it without reservations. — amoeba, Jul 10 '15 at 08:00
After further reflection, I decided I will move my answer there and vote to close this Q as a duplicate. — amoeba, Jul 10 '15 at 08:04

Methods for meta-analysis: z-transform (Stouffer's approach) vs. Fisher's method vs.... infinite others?

0 Answers0