Questions tagged [p-value]

In frequentist hypothesis testing, the $p$-value is the probability of a result as extreme (or more) than the observed result, under the assumption that the null hypothesis is true.

In frequentist hypothesis testing, the $p$-value is the probability of a result as extreme (or more) than the observed result, under the assumption that the null hypothesis is true. (Extremity is defined with respect to the likelihood ratio of the alternative vs. the null hypothesis. Hence, extremity depends on the alternative.) When the $p$-value is small, the observed data would be unlikely to occur if the null were true. This fact is then typically used to argue that the null hypothesis is false. That is, a low p-value is typically interpreted as evidence suggesting the null hypothesis is false. The most common cutoff for a "small" $p$-value in research is $.05$.

2483 questions
280
votes
16 answers

What is the meaning of p values and t values in statistical tests?

After taking a statistics course and then trying to help fellow students, I noticed one subject that inspires much head-desk banging is interpreting the results of statistical hypothesis tests. It seems that students easily learn how to perform the…
152
votes
6 answers

Why are p-values uniformly distributed under the null hypothesis?

Recently, I have found in a paper by Klammer, et al. a statement that p-values should be uniformly distributed. I believe the authors, but cannot understand why it is so. Klammer, A. A., Park, C. Y., and Stafford Noble, W. (2009) Statistical…
golobor
  • 1,543
  • 3
  • 10
  • 8
116
votes
10 answers

ASA discusses limitations of $p$-values - what are the alternatives?

We already have multiple threads tagged as p-values that reveal lots of misunderstandings about them. Ten months ago we had a thread about psychological journal that "banned" $p$-values, now American Statistical Association (2016) says that with our…
Tim
  • 108,699
  • 20
  • 212
  • 390
102
votes
9 answers

Is this really how p-values work? Can a million research papers per year be based on pure randomness?

I'm very new to statistics, and I'm just learning to understand the basics, including $p$-values. But there is a huge question mark in my mind right now, and I kind of hope my understanding is wrong. Here's my thought process: Aren't all researches…
n_mu_sigma
  • 1,071
  • 2
  • 8
  • 6
95
votes
2 answers

How much do we know about p-hacking "in the wild"?

The phrase p-hacking (also: "data dredging", "snooping" or "fishing") refers to various kinds of statistical malpractice in which results become artificially statistically significant. There are many ways to procure a "more significant" result,…
91
votes
4 answers

When to use Fisher and Neyman-Pearson framework?

I've been reading a lot lately about the differences between Fisher's method of hypothesis testing and the Neyman-Pearson school of thought. My question is, ignoring philosophical objections for a moment; when should we use the Fisher's approach of…
Stijn
  • 1,550
  • 1
  • 12
  • 20
81
votes
9 answers

Regarding p-values, why 1% and 5%? Why not 6% or 10%?

Regarding p-values, I am wondering why $1$% and $5$% seem to be the gold standard for "statistical significance". Why not other values, like $6$% or $10$%? Is there a fundamental mathematical reason for this, or is this just a widely held…
Contango
  • 1,387
  • 1
  • 16
  • 15
81
votes
11 answers

How to obtain the p-value (check significance) of an effect in a lme4 mixed model?

I use lme4 in R to fit the mixed model lmer(value~status+(1|experiment))) where value is continuous, status and experiment are factors, and I get Linear mixed model fit by REML Formula: value ~ status + (1 | experiment) AIC BIC logLik…
ECII
  • 1,791
  • 2
  • 17
  • 25
75
votes
4 answers

How should tiny $p$-values be reported? (and why does R put a minimum on 2.22e-16?)

For some tests in R, there is a lower limit on the p-value calculations of $2.22 \cdot 10^{-16}$. I'm not sure why it's this number, if there is a good reason for it or if it's just arbitrary. A lot of other stats packages just go to 0.0001, so this…
paul
  • 1,342
  • 3
  • 11
  • 16
73
votes
4 answers

A psychology journal banned p-values and confidence intervals; is it indeed wise to stop using them?

On 25 February 2015, the journal Basic and Applied Social Psychology issued an editorial banning $p$-values and confidence intervals from all future papers. Specifically, they say (formatting and emphasis are mine): [...] prior to publication,…
amoeba
  • 93,463
  • 28
  • 275
  • 317
71
votes
3 answers

Is this the solution to the p-value problem?

In February 2016, the American Statistical Association released a formal statement on statistical significance and p-values. Our thread about it discusses these issues extensively. However, no authority has come forth to offer a universally…
whuber
  • 281,159
  • 54
  • 637
  • 1,101
69
votes
8 answers

What is a good, convincing example in which p-values are useful?

My question in the title is self explanatory, but I would like to give it some context. The ASA released a statement earlier this week “on p-values: context, process, and purpose”, outlining various common misconceptions of the p-value, and urging…
Tal Galili
  • 19,935
  • 32
  • 133
  • 195
67
votes
3 answers

References containing arguments against null hypothesis significance testing?

In the last few years I've read a number of papers arguing against the use of null hypothesis significance testing in science, but didn't think to keep a persistent list. A colleague recently asked me for such a list, so I thought I'd ask everyone…
63
votes
3 answers

Explain the xkcd jelly bean comic: What makes it funny?

I see that one time out of the twenty total tests they run, $p < 0.05$, so they wrongly assume that during one of the twenty tests, the result is significant ($0.05 = 1/20$). xkcd jelly bean comic - "Significant" Title: Significant Hover text:…
63
votes
5 answers

Why does collecting data until finding a significant result increase Type I error rate?

I was wondering exactly why collecting data until a significant result (e.g., $p \lt .05$) is obtained (i.e., p-hacking) increases the Type I error rate? I would also highly appreciate an R demonstration of this phenomenon.
Reza
  • 876
  • 7
  • 10
1
2 3
99 100