5

I learned that the credible interval does not have the frequentist property, but recently I read the following statements that derived from the credible interval/region:

Point (0,0) is on the edge line of the 98% credible region for the joint posterior density. The test for overall treatment effects is significant with p-value 0.02.

in this thesis (Page 88) and

We also depict the upper and lower 2.5% posterior quantiles in the figure. From these posterior inferences, we can further identify differentially expressed proteins. For example, if we require the 2.5% quantile above zero or the 97.5% quantile below zero, there are 19 up-regulated and 7 down-regulated proteins.

in this paper (Section 4).

Are they proper? Or what is the proper way to conclude from a credible interval? Any input will be greatly appreciated.

Randel
  • 6,199
  • 4
  • 39
  • 65
  • 1
    A related and interesting discussion can be found [here](http://stats.stackexchange.com/questions/31679/what-is-the-connection-between-credible-regions-and-bayesian-hypothesis-tests), about the connection between credible regions and Bayesian hypothesis tests, though "it does not seem to solve the point-null hypothesis Bayesian testing problem." – Randel Nov 30 '15 at 17:37

1 Answers1

7

Confidence intervals can be used equivalently to hypothesis tests, but highest density intervals are not the same as confidence intervals. Let's start with what $p$-value is by quoting Cohen (1994)

What we want to know is "Given this data what is the probability that $H_0$ is true?" But as most of us know, what it $p$-value tells us is "Given that $H_0$ is true, what is the probability of this (or more extreme) data?" These are not the same (...)

So $p$-value tells us what is the $P(D|H_0)$. In Bayesian approach we want to learn directly (rather than indirectly) about probability of some parameter given the data that we have $P(\theta|D)$ by employing the Bayes theorem and using priors for $\theta$

$$ \underbrace{P(\theta|D)}_\text{posterior} \propto \underbrace{P(D|\theta)}_\text{likelihood} \times \underbrace{P(\theta)}_\text{prior} $$

So if 95% confidence interval does not include the null value(s), than you can reject your null hypothesis: your data is more extreme than you would expect given your hypothesis. On the other hand, if in Bayesian setting your 95% highest density interval does not include null value(s), than you can conclude that probability of observing such value(s) is less than 95%.

Kruschke (2010) can be quoted for comparison of both approaches

The primary goal of NHST [Null Hypothesis Significance Testing] is determining whether a particular "null" value of a parameter can be rejected. One can also ask what range of parameter values would not be rejected. This range of non-rejectable parameter values is called the confidence interval. (...) The confidence interval tells us something about the probability of extreme unobserved data values that we might have gotten if we repeated the experiment (...)

A concept in Bayesian inference, that is somewhat analogous to the NHST confidence interval, is the highest density interval (HDI), (...) The 95% HDI consists of those values of $\theta$ that have at least some minimal level of posterior believability, such that the total probability of all such $\theta$ values is 95%. (...) The NHST confidence interval, on the other hand, has no direct relationship with what we want to know; there's no clear relationship between the probability of rejecting the value $\theta$ and the believability of $\theta$.

Posterior probability can be used and is used for testing hypothesis, but you have to remember that it provides answer for a different question than $p$-values.

See also: What is the connection between credible regions and Bayesian hypothesis tests? and Why does a 95% Confidence Interval (CI) not imply a 95% chance of containing the mean?


Cohen, J. (1994). The earth is round (p<.05). American Psychologist, 49, 997-1003.

Kruschke, J.K. (2010). Doing Bayesian Data Analysis: A Tutorial with R and BUGS. Academic Press / Elsevier.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • +1. So the two quotes in the question are improper? – Randel Dec 01 '15 at 14:10
  • @Randel As about #1, as said, this is not exactly p-value (while often it is used in similar manner); as about #2 it seems to describe a different scenario. – Tim Dec 01 '15 at 14:22
  • 1
    @Randel what I'm saying is that even if you want to use it in similar fashion as p-values, you have to remember that is is different and concerns with different probabilities than p-values. – Tim Dec 01 '15 at 14:27
  • Thanks! Can I think in this way? For #1, "while often it is used in similar manner", but we cannot call it "p-value"; for #2, it directly uses credible interval for hypothesis testing but it's not in the frequentist favor? It seems not rigorous, e.g., not consider multiple testing. – Randel Dec 01 '15 at 14:36
  • 1
    @Randel with #1 you are correct. But #2 is taken out of context and it is hard to comment like this. – Tim Dec 01 '15 at 14:42
  • Thanks a lot @Tim for your help! After closely reading the first question you linked and [MånsT's paper](http://www.sciencedirect.com/science/article/pii/S0378375813002498) therein, I learned that #2 is a proper way to do Bayesian hypothesis testing. MånsT discussed power in the paper, though it's still unclear to me if the type I error rate can also be assessed. Plus the argument in the paper "When (the credible intervals) are viewed as frequentist confidence intervals, these measures of evidence coincide with the p-values of the corresponding two-sided tests" seems to support #1. – Randel Dec 01 '15 at 21:38
  • @Randel check also the Kruschke's book I quote for discussion of this topic. – Tim Dec 01 '15 at 21:40
  • Yes, I read the Chapter 12 about hypothesis testing in the second edition, but I did not find the testing procedure using credible interval. – Randel Dec 01 '15 at 21:54
  • @Randel there are separate parts on CIs and HDIs and on hypothesis testing; CIs are related to testing so you should look at both topics. Especially in Bayesian setting HDIs are directly used for testing, so good understanding of interval estimation translates to understanding of testing (and vice versa). – Tim Dec 01 '15 at 21:59
  • It looks like in Confidence Interval approach you scan the columns (hypothesis) of [my Bayesian tables](http://math.stackexchange.com/a/1987205/354060) to see if the observed value row appears in the rejection region of the hypothesis, whereas in the Credibility Interval, you do what is done in that post -- convert the table into the one whose rows sum up to 100%, select the observed row and select those with highest P(H|D) until you are happy with their aggregate probability and leave the rest as 'uncredible'. Here, hypothesis is the same as the parameter $\theta$. Is it right? – Little Alien Oct 27 '16 at 11:20
  • @LittleAlien I'm afraid I do not understand what do you mean... The difference between the two kinds of intervals is that in Bayesian case you can actually calculate the probability of observing $\theta$ values, it's not something that confidence intervals tell you. – Tim Oct 27 '16 at 11:27
  • This answer is so amazing! – Jinhua Wang May 07 '19 at 09:38