Is the p-postulate (equal p-values provide equal evidence against the null) true?

Question

The p-postulate is the notion that equal p-values provide equal evidence against the null hypothesis.

Wagenmakers et al (2008) write:

If p-values truly reflect evidence, a minimum requirement is that equal p-values provide equal evidence against the null hypothesis (i.e., the p-postulate). According to the p-postulate, p = .05 with 10 observations constitutes just as much evidence against the null hypothesis as does p = .05 after 50 observations.

They cite Royall (1986) as the source for their definition of the p-postulate. They also go on saying that this postulate is false. Is it?

Royall, R. N. (1986). The Effect of Sample Size on the Meaning of Significance Tests, The American Statistician, 40:4, 313-315
Wagenmakers, E.-J., Lee, M. D., Lodewyckx, T., & Iverson, G. (2008). Bayesian versus frequentist inference. In H. Hoijtink, I. Klugkist, and P. A. Boelen (Eds.), Bayesian Evaluation of Informative Hypotheses, pp. 181-207. Springer: New York.

What do you understand "evidence" to be here? Do you have an exact definition in mind? — Erik, May 03 '13 at 10:32
I do not have an exact definition in mind. I'm open to the answer that it depends on what definition of evidence one adopts. In that case, I'm curious about what definitions respondents feel are most defensible, and why they suggest the p-postulate is true or false. — user1205901 - Reinstate Monica, May 04 '13 at 01:08
Maybe this helps http://stats.stackexchange.com/questions/166323/misunderstanding-a-p-value/166327#166327 — , Jun 14 '16 at 15:21

Brett · Answer 1 · 2016-06-14T16:04:11.740

2

The Royall paper begins with two quotes that providing apparently contradictory interpretations of the p-value. Both rely on interpreting the p-value in light of the sample size and as such both are flawed interpretations of the p-value.

A p-value tells us one thing and one thing only--the probability of observing a statistic as extreme or more extreme than that observed in a sample as a result of random sampling error. A p-value of .05 with a sample of 10 or a sample of 50 (or any sample size for that matter) yields the same interpretation in any case. Under the assumptions of the model, a difference of the magnitude observed or greater would be observed in just 5% of samples if the null hypothesis were actually true.

So, in response to your specific question and focusing on interpreting only the p-value, the answer is yes--equal p-values provide equal evidence against the null hypothesis at any sample size.

This does not tell us anything about the magnitude of the difference or the effect size. Indeed, all else being equal, the same difference in an observed effect size will yield lower p-values as sample size increases. Strength of evidence against the null (p-value) and magnitude of the difference (effect size) should be interpreted together.

edited Jun 14 '16 at 16:04

answered Jun 14 '16 at 14:54

Brett

5,708
3
29
41

1

(1) Royall's paper starts here and confronts this with two other (conflicting) claims about p-values. He concludes with "despite their apparent inconsistencies, interpretations can be given of all three statements ... that make each of them correct." It would be nice to see an explicit acknowledgment of, and response to, that conclusion. (2) re your last line: given that p-values are random, "will" is too strong to be correct. Perhaps you mean "has a greater chance to"? – whuber Jun 14 '16 at 15:10
@whuber 1. I edited to address. 2. I added "observed" to effect size, otherwise this is a mathematical result, no? All else equal in the inputs, increase n in calculating p = smaller p? – Brett Jun 14 '16 at 15:57
It's hard to see how one could keep everything equal in the inputs while increasing $n$. The closest I can imagine is a sequence of independent draws from a distribution--but even then, the p-values constructed from the first $2, 3, \ldots, n$ elements of the sequence can actually *increase* almost as often as they decrease. – whuber Jun 14 '16 at 19:31
I am talking about the calculation in a hypothetical situation. Increase n it decreases p. – Brett Jun 14 '16 at 20:19
As far as I can tell, there is no such hypothetical situation! To increase $n$, *you have to obtain one more* random *result.* Because it is random, frequently the p-value goes up. – whuber Jun 14 '16 at 20:34
$\sigma_{\bar{X}}=\sigma/\sqrt{n}$. For example, same difference in means, smaller sigma = larger t and smaller p? – Brett Jun 14 '16 at 20:43
That's true for the *sampling distribution*--but p-values tell us about *data.* The regularity you note holds for the sample standard deviation only *on average.* That's why "will" is too definite. It's similar to saying that a ten-year old *will* be taller than a nine-year old. Although it's a reasonable bet, you would lose it a sizable amount of the time. – whuber Jun 14 '16 at 20:47
Right, and in the calculation of your usual test of significance, we estimate the S.E. from the sample standard deviation and sample size. Increase n and the SE decreases and the test statistic is more extreme when the sample s.d. and observed difference are the same. Formulaically, this is true. – Brett Jun 14 '16 at 21:03

Is the p-postulate (equal p-values provide equal evidence against the null) true?

1 Answers1