7

The reproducibility crisis has given many pause over the value (?) of $p$-values to measure the relevance of statistical findings. Given the interpretation of a $p$-value and some knowledge of probability, it's not surprising to see how many confirmatory studies fail to show $p<0.05$ when the originating study had $p<0.05$ (guaranteed at a rate much higher than $0.05$). The bit I struggle with is whether that in fact confirms or disproves the originating study.

One thought is: why aren't these studies being compared in terms of their confidence intervals? If the originating study is declared statistically significant on the basis of a 95% CI not including the null hypothesized value (equivalent to $p$-value based inference), it seems much more plausible that a confirmatory study would produce an effect which lies within the 95% CI despite lacking statistical significance itself?

Does this imply that the basis for evaluating reproducibility of studies (rather than evaluating statistical significance) is wrong?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
AdamO
  • 52,330
  • 5
  • 104
  • 209
  • 2
    Very relevant: http://datacolada.org/47 – amoeba Nov 12 '18 at 15:58
  • 1
    Harry Crane adds some really interesting thoughts on this discussion on a slightly different yet closely connected debate. Check [here](https://arxiv.org/abs/1711.07801). – MauOlivares Nov 12 '18 at 21:32

0 Answers0