0

I was looking at my statistics lecture notes on Bayesian inference, and they talk about a p-value. Specifically, they say that a p-value is the probability that the observed effect is due to random/spurious effects. So, if p = 0.05, there is a 5% probability that the observed effect is due to random effects.

Given this, I am trying to understand this in terms of a probability density function, say $f(x)$. How would one use this density function to determine if some event is spurious?

Thomas Moore
  • 1,375
  • 10
  • 17
  • If those lecture notes indeed characterize the p-value as "the probability that the observed effect is due to random/spurious effects," then they are fundamentally wrong and you should first clear up that misconception. Only then would it be possible to discuss the question about the density in a meaningful way. https://stats.stackexchange.com/questions/31 might be a good place to begin. For an authoritative account elsewhere, search for the ASA's statement on p-values. – whuber Oct 22 '20 at 18:08
  • 1
    Can you link to these lecture notes, or provide a screenshot ? I have wondered for a long time how these misconceptions are propogated ! – Robert Long Oct 22 '20 at 18:47
  • The lecture was demoing a R package that does Bayesian causal inference: https://cran.r-project.org/web/packages/CausalImpact/vignettes/CausalImpact.html It has become a very well known package. The package returns a p-value which is based on the description in my question above. :) – Thomas Moore Oct 22 '20 at 18:49
  • 1
    A systematic search of the site you link to, Thomas, reveals it does not contain any reference to a "p value" whatsoever. That's no surprise, because a Bayesian analysis does not produce a p value. – whuber Oct 22 '20 at 18:53
  • If you run the code for an example, it produces it as part of the output. – Thomas Moore Oct 22 '20 at 18:54
  • Given its Bayesian orientation, I doubt it produces a p value. I see it does offer a "Posterior tail-area probability p"--but that's not a p-value. Regardless, I cannot find any version of the misconception you attribute to this reference. – whuber Oct 22 '20 at 19:00
  • 1
    *" So, if p = 0.05, there is a 5% probability that the observed effect is due to random effects."* More specifically it's the probability that the observed data/effect is due to random effects *conditional* on the $H_0$ being true. How often you will observe data due to random effects will also depend on how often the $H_0$ is true (if $H_0$ is never true, then you might still observe p-values of .05 but that does not mean that in 1/20 of these cases you observed the result due to random effects). Said differently, the probability for type I errors does not equal the $\alpha$ level. – Sextus Empiricus Oct 23 '20 at 15:52

1 Answers1

2

p-values are not commonly used in Bayesian inference, so this is a bit confusing. See for example What are Bayesian p-values? for some discussion of "Bayesian p-values". I'm going to assume you are talking about a frequentist p-value. It's notoriously hard to get the definition right for what a p-value is. To correct your sentence, you have to add something about the assumption of the null-hypothesis. Also, it's the other way around, assuming the null hypothesis, there is a five percent chance that it would generate an event at least extreme as the event you witnessed. You cannot draw any conclusions about what happens if your model is wrong, so an unqualified "95% chance the null hypothesis is incorrect" is not a conclusion you can draw.

To answer your question more directly. If $f$ is the density function for the statistic you are testing under the null hypothesis, a p-value of 0.05 means that there is 5% probability mass, or an area of 0.05 under the curve of $f$ for an outcome more extreme than the one you witnessed. You would need to specify whether this is calculated based on a two-tailed or one-tailed hypothesis.

Gijs
  • 3,409
  • 11
  • 18