Since one can calculate confidence intervals for p-values and since the opposite of interval estimation is point estimation: Is p-value a point estimate?

- 93,463
- 28
- 275
- 317

- 1,202
- 1
- 14
- 26
-
6I don't believe one *can* calculate confidence intervals for a p-value; it's a statistic calculated from the data, not a parameter describing the data-generating process. Of course you can still ask what a statistic estimates. – Scortchi - Reinstate Monica Nov 13 '15 at 14:29
-
1@Scortchi: but if I were to apply e.g. bootstrapping to compute a distribution of p-values and then were to construct a 95% percentile interval of this bootstrapped distribution, then if it's not a confidence interval for the p-value -- *what is it*? – amoeba Nov 13 '15 at 21:27
-
2@amoeba: a confidence interval is about an unknown parameter, while your bootstrap interval is an approximation of a 95% region for a statistic. – Xi'an Nov 13 '15 at 21:32
-
@Scorthci: I have seen software that prints CI's for p-values. In this case, the approximate p-values were calculated by permutation tests, so if the CI was too wide (i.e. p-value $\in [0, 0.05]$ and p-value $\in [0.05, 1]$), you would use more permutations before making inference. – Cliff AB Nov 13 '15 at 21:52
-
4@Cliff That's not a confidence interval for the p-value *qua* property of a distribution: that's a confidence interval for a stochastic estimator of the p-value of a test for a particular sample. Although they sound similar, and both are intervals, they are completely different things. – whuber Nov 13 '15 at 22:07
-
@whuber: completely agree. But I believe it's not completely wrong interpretation to call this simply a "confidence interval for a p-value", as the OP stated. I think Scortchi interpreted what they had posted as stating that a p-value was a parameter based on the *population of interest in the study*, which doesn't really make sense. But considering a p-value as a parameter of interest based on *your sample and statistical model* does not seem wrong to me. Often this parameter is known exactly, but occasionally it is estimated. – Cliff AB Nov 13 '15 at 22:21
-
Thinking of it this way is a different angle than standard statistics (we usually only think of parameters coming from populations and estimates coming from samples), but as I noted, this slightly twisted view is not purely theoretical; you can get outputs from published statistical software that will present a CI for a p-value. – Cliff AB Nov 13 '15 at 22:25
3 Answers
Point estimates and confidence intervals are for parameters that describe the distribution, e.g. mean or standard deviation.
But unlike other sample statistics like the sample mean and the sample standard deviation the p-value is not an useful estimator of an interesting distribution parameter. Look at the answer by @whuber for technical details.
The p-value for a test-statistic gives the probability of observing a deviation from the expected value of the test-statistic as least as large as observed in the sample, calculated under the assumption that the null hypothesis is true. If you have the entire distribution it is either consistent with the null hypothesis, or it is not. This can be described with by indicator variable (again, see the answer by @whuber).
But the p-value cannot be used as an useful estimator of the indicator variable because it is not consistent as the p-value does not converge as the sample size increases if the null hypothesis is true. This is a pretty complicated alternate way of stating that a statistical test can either reject or fail to reject the null, but never confirm it.

- 6,909
- 20
- 48
-
3Most of the better accounts of statistical tests (Lehman, Kiefer, etc.) do not refer to "populations" at all, but instead frame the situation in terms of estimating parameters of *distributions.* This does not require the randomness to be due solely to sampling, and thereby allows the theory more broadly to apply to situations where the randomness is part of a *model*. – whuber Nov 13 '15 at 13:30
-
-
2But you have explicitly contradicted that with the statement there "are no probabilities associated with the population at all." Please note, too, that *all* estimators are "explicitly defined on sample level." It is therefore difficult to determine what distinction you are trying to make in this post. – whuber Nov 13 '15 at 13:40
-
Even with a distribution, the probability comes in play when sampling from the distribution? And the estimators are defined on sample level, but what they estimate is defined on population level. – Erik Nov 13 '15 at 13:41
-
2
-
4(-1) I agree with both @Tim's common-sensical answer & whuber's recondite answer, but am struggling to make any sense of this one. (1) "But the p-value is not a population parameter since it is explicitly defined on sample level": this is doubtless worth pointing out, but the "but" makes it seem like you're saying that a p-value can't be an estimate of anything because it's a sample statistic, as if the sample mean couldn't be an estimate of anything because it's a sample statistic. ... – Scortchi - Reinstate Monica Nov 13 '15 at 15:41
-
2(2) "This is because there are no probabilities associated with the population at all, it is regarded as fixed but unknown": (a) The p-value isn't calculated from the sample *because* "there are no probabilities [...]"; (b) as @whuber's pointed out, sampling from a finite population is a special case; (c) in any case it just doesn't follow from what you've said that the p-value doesn't estimate anything about the population. – Scortchi - Reinstate Monica Nov 13 '15 at 15:43
-
@Scortchi Fair enough and thanks for the detailed comment. I think saying that unlike other sample statistics it is not an useful estimator of any parameter of the distribution. Will try to reclarify that point. – Erik Nov 13 '15 at 20:41
-
1@Erik: I disagree with the final sentence since [we proved](http://projecteuclid.org/euclid.aos/1176348534) it can be an admissible estimator. – Xi'an Nov 13 '15 at 21:04
-
1@Xi'an But that is an (admittedly interesting) decision theoretic viewpoint. An answer from that perspective would be helpful, so at least I would be happy if you decided to write one. – Erik Nov 13 '15 at 21:26
Yes, it could be (and has been) argued that a p-value is a point estimate.
In order to identify whatever property of a distribution a p-value might estimate, we would have to assume it is asymptotically unbiased. But, asymptotically, the mean p-value for the null hypothesis is $1/2$ (ideally; for some tests it might be some other nonzero number) and for any other hypothesis it is $0$. Thus, the p-value could be considered an estimator of one-half the indicator function for the null hypothesis.
Admittedly it takes some creativity to view a p-value in this way. We could do a little better by viewing the estimator in question as the decision we make by means of the p-value: is the underlying distribution a member of the null hypothesis or of the alternate hypothesis? Let's call this set of possible decisions $D$. Jack Kiefer writes
We suppose that there is an experiment whose outcome the statistician can observe. This outcome is described by a random variable or random vector $X$ ... . The probability law of $X$ is unknown to the statistician, but it is known that the distribution function $F$ of $X$ is a member of a specified class $\Omega$ of distribution functions. ...
A statistical problem is said to be a problem of point estimation if $D$ is the collection of possible values of some real or vector-valued property of $F$ which depends on $F$ in a reasonably smooth way.
In this case, because $D$ is discrete, "reasonably smooth" is not a restriction at all. Kiefer's terminology reflects this by referring to statistical procedures with discrete decision spaces as "tests" instead of "point estimators."
Although it is interesting to explore the limits (and limitations) of such definitions, as this question invites us to do, perhaps we should not insist too strongly that a p-value is a point estimator, because this distinction between estimators and tests is both useful and conventional.
In a comment to this question, Christian Robert brought attention to a 1992 paper where he and co-authors took exactly this point of view and analyzed the admissibility of the p-value as an estimator of the indicator function. See the link in the references below. The paper begins,
Approaches to hypothesis testing have usually treated the problem of testing as one of decision-making rather than estimation. More precisely, a formal hypothesis test will result in a conclusion as to whether a hypothesis is true, and not provide a measure of evidence to associate with that conclusion. In this paper we consider hypothesis testing as an estimation problem within a decision-theoretic framework ... .
[Emphasis added.]
References
Jiunn Tzon Hwang, George Casella, Christian Robert, Martin T. Wells, and Roger H. Farrell, Estimation of Accuracy in Testing. Ann. Statist. Volume 20, Number 1 (1992), 490-509. Open access.
Jack Carl Kiefer, Introduction to Statistical Inference. Springer-Verlag, 1987.
-
2Hmm. I am not sure if this view is helpful. For one in this sense the p-value is not a good estimator, since it is not consistent if the null hypothesis is true. And in the some cases (you mention that) it has a sample-size dependent bias as well. It might be technical true, but any random number could be (terrible) estimator for any parameter as well. – Erik Nov 13 '15 at 13:46
-
10The question does not ask whether the p-value is a *good* estimator, @Erik. As an estimator, it has obvious deficiencies. For instance, its asymptotic variance for the null hypothesis is nonzero. Please note that the bias of almost *every* unbiased estimator depends on sample size. Although you are correct that an independent random number could be viewed as an estimator, it would be an estimator of something different: it would estimate its own mean (by definition). Thus your objections appear not to have any relevance to the question at hand. – whuber Nov 13 '15 at 13:49
-
What the estimator estimates is part of the definition. You can't say that the random number would by definition estimate the own mean. There is no reason why I can't define e.g. the sample standard deviation as an estimate of the mean or vice versa. It's just a bad estimator, like the p-value for the indicator function. What I am trying is to express that seeing the p-value as a point-estimate might be correct on some technical level, but that it is not helpful. When speaking of estimators for a specific parameter at least consistency is implicitly assumed unless stated otherwise. – Erik Nov 13 '15 at 14:05
-
BTW I still upvoted the answer as it provides a helpful additional perspective from a technical point of view. – Erik Nov 13 '15 at 14:06
-
7I don't think we differ on any of those points, @Erik, except perhaps the "unhelpful" part. As Nick Cox points out in a comment elsewhere in this thread, it is nevertheless *interesting* to contemplate the sense in which a p-value could be considered an estimator and what, exactly, it could possibly be estimating. That can help us understand a little better just what a p-value is (and is not). Many would view that as a *helpful* exercise. – whuber Nov 13 '15 at 14:10
-
Maybe I was imprecise. I agree that is an helpful and interesting view when thinking about estimators. But I don't think it adds much intuition or understanding when thinking about p-values and I saw the focus of the question more on p-values than on estimators. – Erik Nov 13 '15 at 14:13
-
7In a [1992 paper](http://projecteuclid.org/euclid.aos/1176348534), we study the $p$-value as an estimator of the indicator function $\mathbb{I}_{\Theta_0}(\theta)$ and demonstrate that it can be an admissible estimator for one-sided hypothesis and cannot be admissible for two-sided hypotheses. – Xi'an Nov 13 '15 at 21:02
-
1@Xi'an I see we're only 23 years behind you... . Thank you for the reference! – whuber Nov 13 '15 at 22:32
-
1"Thus, the p-value could be considered an estimator of one-half the indicator function for the alternate hypothesis." Shouldn't that read "the indicator function for the null hypothesis"? – Andrew M Nov 13 '15 at 23:51
-
1@whuber: thanks for bringing Jack Kiefer's views back to life! And for including our AoS reference. You have my vote for this being _the_ answer! – Xi'an Nov 14 '15 at 14:16
-
@Andrew You're correct--I mixed up the language. I'll fix that. Thank you for noticing this! – whuber Nov 14 '15 at 15:57
$p$-values are not used for estimating any parameter of interest, but for hypothesis testing. For example, you could be interested in estimating population $\mu$ based on the sample you have, or you could be interested in interval estimate of it, but in hypothesis testing scenario you would rather compare the sample mean $\overline x$ with population mean $\mu$ to see if they differ. In fact in hypothesis testing scenario you are not interested in their particular values, but rather if they are below some threshold (e.g. $p < 0.05$). With $p$-values you are not that much interested in their point values, but rather you want to know if your data provides enough evidence against null hypothesis. In hypothesis testing scenario, you would not be comparing different $p$-values to each other, but rather use each of them to make separate decisions about your hypotheses. You don't really want to know anything about the hull hypothesis, as far as you know if you can reject it or not. This makes their values inseparable from the decision context and so they differ from point estimates, because with point estimates we are interested in their values per se.
-
5Your initial statement correctly echoes how things are often explained, but nevertheless it does not go deep enough. A basic fact here is sampling variation, the variability from sample to sample. Take a different sample, and your P-value will be different. It takes a little ingenuity to see precisely what it is estimating, and it is not (as far as I know) **conventional** to explain it as estimating an parameter, but that point of view makes perfect sense. See @whuber's interesting answer. (The entire territory is littered with muddy paraphrases based on the need to simplify for teaching.) – Nick Cox Nov 13 '15 at 13:54
-
-
It's referring to Tim's answer. (Comments on your question would belong under your question.) – Nick Cox Nov 13 '15 at 14:08
-
@NickCox p-values are point estimates but we are not interested in their point values, my answer refers to their *usage*. They are not used and are useless as point estimates. Those cases were already discussed in other answers and comments. – Tim Nov 13 '15 at 14:11
-
1How terms are used is interesting and important (and a personal preoccupation, by the way). The question remains what a P-value **is**. This too is pointed out [inevitable pun here] elsewhere in this thread. It's a helpful convention to regard parameters as those unknowns which appear in a model specification, but there are other unknowns too. – Nick Cox Nov 13 '15 at 14:16
-
@Tim why are they useless as point estimators? I almost never see an interval for them, so I guess the usual reported p-value is a point estimate and it's used as you described. – 00schneider Nov 13 '15 at 14:35
-
3@Tim, I think this claim (from your last comment) is almost always not true, at least in biology. People are very much interested in the value of p-values, marking $p<0.05$, $p<0.01$, $p<0.001$ with one, two, or three stars on the figures, writing about something being "highly significant", etc. The usual recommendation is also to report exact p-values, e.g. $p=0.003$, and not $p<0.05$. Only very rarely do people adhere to the strict Neyman-Pearson framework, choose $\alpha$ in advance and report all p-values as $p – amoeba Nov 13 '15 at 15:28
-
5This question intersects with many others, most of which are highly controversial. One is the idealisation that the purpose of a test is to make a decision yes or no, which doesn't match all problems at all. Another key fact is that use of threshold levels was for decades a matter that people used published tables from printed tables and exact P-values were out of reach while people did not use computers. – Nick Cox Nov 13 '15 at 15:58
-
4@00schneider: If you do ever see an interval given for p-values, it's very unlikely to be a confidence interval for the population parameter defined by whuber. Tim's point is that there's no need to consider them as *estimating* anything at all, interesting though it may be to do so. – Scortchi - Reinstate Monica Nov 13 '15 at 16:32