Can we talk about statistical significance using Bayesian Inference?

Question

In short: can we use the words statistical significance when interpreting the hypothesis testing results in the bayesian inference field ? Or is it only correct to use it in the frequentist approach ?

Background:

I am using the Causal Impact R package developed by Google. You can check the git repo here and an intuitive example here. This is the original paper.

The model is a Bayesian Structural time series model. Therefore it uses Bayesian Inference to determine whether the causal effect exists.

The objective of the package is determine whether there exists a causal effect in a time series during a period in which an event takes place (for example, a marketing campaign is running).

On the summary function of the package, it returns a message interpreting the results of the model. It clearly talks about statistical significance

These are the results of the package:

I don't believe the developers of the package are wrong. Therefore I would like to understand why they are using the concept of statistical significance.

Thank you in advance.

The package is not "developed by Google", as can clearly be seen in the document linked under "R" above. — Christian Hennig, Jan 17 '22 at 14:22
@christianHennig You are right, it is not strictly developed by Google. However it is the one recommended and used by Google. A very helpful video is the following: https://www.youtube.com/watch?v=GTgZfCltMm8 where KH Brodersen (Senior Quantitative Analyst at Google) goes through the package. — Angel, Jan 19 '22 at 07:50

Christian Hennig · Answer 1 · 2022-01-17T14:33:37.660

1

This is a potentially controversial question, I think you'll find some who'd say "yes" and some who'd say "no".

A key issue here is that there are mathematical terms that have a clear mathematical definition, and one can then say clearly whether they are used correctly or not, and general language/interpretation terms that can have an ambiguous meaning. Statistical significance, in my view, is uncomfortably between the two. One can give it a formally well defined meaning within the frequentist significance testing framework, but some may argue that this would be too narrow a concept.

A rather broad definition would be that data indicate significant evidence against a model if they fall in a set with low probability under that model (obviously requiring the specification of a significance level, say 5%), and the construction of the set from the data needs to be pre-specified (in case of your Bayesian example it can probably be argued that it was or at least could have been decided before knowing the actual data that the specific prediction interval computed there would be of interest). This definition may apply to the example (in fact I cannot be sure, see the disclaimer below). Had a specific model without intervention effect been true, the average response would have been expected with large probability in a certain interval, but it didn't occur there, so there's significant evidence against that model according to the definition above.

Note though that this model in a Bayesian setting involves the prior; there's "significant evidence" in this way not against the sampling model alone, but against the sampling model plus the prior, and somebody could argue (obviously knowing the details, which I don't) that with another prior it wouldn't have been significant. "Significance" in this way would apply if the model were to be interpreted in a frequentist manner, i.e., with a frequentist process of parameter generation as described by the prior (sometimes referred to as "empirical Bayes").

There are also Bayesians who would object to the use of the word here because they are very critical of frequentist significance tests and therefore think that the term "significance" should be avoided in Bayesian statistics. Something to that effect (you need to read a bit between the lines) is stated here: https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913

I personally am fine with the use you quote here (see the disclaimer though), but the dependence on the prior is crucial to acknowledge.

Disclaimer: In general, Bayesian prediction/credibility intervals may or may not be valid regarding frequentist properties (in particular regarding frequentist "significance"). This depends on the specific setup, prior choices etc. I haven't checked the linked software, so I don't know whether frequentist "significance" as explained above applies in this particular case.

edited Jan 17 '22 at 14:33

answered Jan 14 '22 at 10:43

Christian Hennig

10,796
8
35

2

I know the whole Bayes vs. Frequentist debate is a heated one, and we definitely don't want to get into it here, but I think it's important to stress again that *statistically significant* **is** a technical term, using it this way with Bayesian estimates is 100% technically wrong, and the OP is right to be confused by this. Yes, the term is often misused to mean something broader, and pragmatically people often understand it this way, but we shouldn't let that this muddy the waters. – Eoin Jan 14 '22 at 11:13
While we're at it, the "unlikely to be due to random fluctuations" bit is also absolutely wrong from a technical point of view (but also a common mistake), so who knows what's going on here. – Eoin Jan 14 '22 at 11:15
@Eoin No it's not "100% technically wrong". If you write down the initial Bayesian model *including* the prior, what I describe defines a valid frequentist significance test of that model. (There are situations in which the prior can be given a frequentist meaning - there's empirical Bayes!) – Christian Hennig Jan 14 '22 at 11:15
Incorrect. It is 100% technically wrong. – Frank Harrell Jan 14 '22 at 13:32
@FrankHarrell What about explaining your point? My point is: You write down a Bayesian model including a prior. There are parameters in the sampling distribution that correspond to Intervention effects. Under that model you construct a 95% prediction interval for a certain statistic assuming no intervention effect. Then you observe the data. Whether the statistic is in the interval or not constitutes a valid (though probably not optimal) 5% test of the full model for no intervention effect including the prior. – Christian Hennig Jan 14 '22 at 14:02
No, these are not parameters in a sampling distribution. These are unknown parameter values generating the one dataset in front of you. That is a drastically different idea. The rest of what you wrote is a confusing mix of Bayesian and frequentist ideas. Keep them separate for our sanity :-) There are no sampling distributions with Bayes. – Frank Harrell Jan 14 '22 at 18:20
@FrankHarrell The point you're making seems to be an interpretative one, not a technical one. Bayesian models and computations can be used with frequentist interpretation, although not many do that, see https://statmodeling.stat.columbia.edu/2018/06/17/bayesians-are-frequentists/ . I have also seen the term sampling distribution used with a Bayesian meaning, e.g., here https://en.wikipedia.org/wiki/Bayesian_inference People use these terms with different meanings. You are right that this is confusing, but it's a fact. Better write your own answer to make your point. – Christian Hennig Jan 14 '22 at 18:41
I'll just leave it with the observation that a sampling distribution is at odds with Bayesian ideas. – Frank Harrell Jan 14 '22 at 19:10
But it is still true that many Bayesian procedures have good frequentist operating characteristics. But that is only in the fixed sample size situation where there is only one look at the data (fully pre-specified). – Frank Harrell Jan 14 '22 at 19:21
my conclusion is that using "statistical significance" in Bayesian inference is debatable. Therefore I would use then "significant evidence" (just wording) to avoid confusion. My biggest concern now is to understand why the R package that google developed (causal impact), uses the posterior-tail-area-probability (see image in the Question) to reject the null hyptohesis that there was no causal effect. Since this is for me a clear frequentist approach. Any thoughts on that ? – Angel Jan 17 '22 at 11:53
The software seems to be based on questionable statistical methods and the 'significance' interpretation is troubling. The idea that you can treat a difference from an expected value as causal needs examination. But the simple answer is just to compute posterior probabilities of interest and be done with it; don't use credible intervals as the main way to look at evidence for an effect. Compute the posterior probability $\Pr(theta > 0 | data, prior)$. – Frank Harrell Jan 17 '22 at 13:03
I have added a disclaimer to my answer and done some edits, reflecting the fact that what I had claimed before may or may not hold in the discussed software, depending on the specifics of the situation. My point was meant to be a general one - Bayesian prediction intervals *can* be of a form that significance statements can be made that are valid in a frequentist sense. This is sometimes true but not always. – Christian Hennig Jan 17 '22 at 14:36

Can we talk about statistical significance using Bayesian Inference?

1 Answers1