Imagine a simple experiment, trying to answer a simple question. For example, is body temperature the same in men and in women ?
To answer this question, let's say you sample 10 men, and 10 women, randomly from a given city, and measure their respective body temperature (same protocol of measurement for everybody, of course).
Then imagine you get a significant (alpha=5%) difference between these two samples.
You cannot ignore a possible statistical fluke, can you ? (This may constitute a subsidiary question, and I will be pleased if you can answer it too, but the main question lies below) You may want to repeat this experiment a few times, for example, in independent cities, to get very confident about the reality of the difference you observed in the first experiment
Imagine again, that you repeat this experiment 8 times (including the first one), and you observe a significant difference between men and women in 4 of them.
My question is : How much confident can I be that the difference is real, if I have only this information : 4 out of 8 independent tests were significant at alpha = 5% ? (Or, to paraphrase, How can I calculate the overall p-value, when all I have is the p-value linked to each repetition experiment ? Maybe I need additional information ?)
(This is a simple example, for thinking efficiently about a real problem much more complicated...)