Is it really worth doing Bayesian Analysis if you have no idea about Priors?

Question

I have heard that if you use uniform priors in Bayesian Analysis, it is the same as doing Frequentist Analysis. If you are creating statistical models and you really have no idea about the prior distribution of the parameters in your model (e.g. maybe you only know that the parameters have to be greater than 0 since they represent concepts in the real world, e.g. human height) - does it really make sense to use Bayesian based models?

I have heard of concepts such as Non-Informative Priors and Jeffery's Priors - but it still seems to me that you can not "pull something out of nothing". If you really have no idea about the values and distributions for your prior parameters - statistically speaking, it would not seem "responsible" to select anything but Uniform Priors for your parameters - thus making your analysis quite similar (if not identical) to the Frequentist approach.

Does anyone have any comments to add to this?

Thanks

Sometimes a frequentist confidence interval has endpoints that go outside the the possible values (such as a Success probability which takes values in $(0,1).)$ Then choosing even an uninformative beta prior with support #$(0,1)$ forces the posterior interval estimate to lie entirely in $(0,1).$ The Gibbs sampler [here](https://stats.stackexchange.com/questions/455129/trying-to-estimate-disease-prevalence-from-fragmentary-test-results) provides an example. — BruceET, Oct 17 '21 at 08:48

Christian Hennig · Answer 1 · 2021-10-17T10:18:12.567

There are important philosophical differences between a frequentist and a Bayesian approach. For this reason it is not appropriate to say that "if you use uniform priors in Bayesian Analysis, it is the same as doing Frequentist Analysis".

The frequentist idea of probability is that we assume there is an underlying true process that generates the data and is described by the probability model (note that "assuming" this doesn't mean to really believe the model is true, as we can acknowledge that a model is an idealisation).

There are various interpretations of probability that are used with Bayesian statistics, but mostly this is used with an "epistemic" interpretation of probability, meaning that probability models our state of knowledge rather than an underlying real process. Probability calculus in this case requires that you start from a prior formalising your knowledge/belief before observing the data. If you don't have any particular belief or knowledge, default/informationless priors can be used, but there are issues, because in many problems there is no agreement on what "informationless" actually means, so that there is more than one possibility how to do that, and in some problems it turns out that using certain default priors can have undesirable consequences, as they ignore certain standard ways in which parameters are often connected to each other. De Finetti and others have argued that in any situation there is some kind of background knowledge, even very weak (like experience with this kind of model in other circumstances), that could be used and give rise to a weakly informative prior. Note also that there is a distinction between so-called subjectivist and objectivist Bayesians; the former are meant to formalise their personal prior beliefs involving of course all their knowledge, whereas the latter would only involve (more or less, models are always idealisations) properly secured and scientifically agreed knowldge and otherwise use default priors.

The baseline is that a frequentist 95%-confidence interval for a parameter, say, $[1,5]$, means that given that reality behaves as the assumed model, the true parameter value will be captured with probability 95% by the interval, i.e., if the interval were computed for many datasets, the true parameter would ultimately end up in 95% of the intervals. Note that this is different from saying that the probability is 95% that the true parameter is between 1 and 5, because in frequentist statistics the parameter is not random, only the data are!

A Bayesian 95%-posterior credibility interval $[1,5]$ means that the data should alter your beliefs in such a way that now with probability 95% the model with a parameter between 1 and 5 is appropriate. As with frequentist confidence intervals, there is an issue also with this one that many get wrong: According to standard Bayesian philosophy there is no such thing as a true fixed parameter, so it is not true in a Bayesian setup either to say "the true parameter is between 1 and 5 with probability 95%". However, one can show that the belief formalised in a Bayesian analysis assuming exchangeability implies that ultimately data will behave as if there were a true frequentist parameter, so that "the true parameter in between 1 and 5 with probability 95%" has some meaning as long as you are willing to rely on your belief model (note however that, as probabilities are not about data generating processes, the data will not be given a chance to disprove your belief model; your belief entirely determines how the data are used - the frequentist would say, if the data are not in line with the model, you need to change/improve your model).

So you have a choice to make whether you want epistemic or frequentist probabilities; if you don't think the concept of frequentist probabilities is appropriate, sure it's worth doing a Bayesian analysis, as a frequentist one won't give you anything useful.

To make matters more complicated, Bayesian analyses can also be done with a frequentist concept of probability in mind. We call this "falsificationist Bayes" here: https://rss.onlinelibrary.wiley.com/doi/full/10.1111/rssa.12276

The advantage of this is that then the model can be checked against the data and improved as a result of such a check, and your inference is about the underlying data generating process. The disadvantage is that it is harder to explain what the prior and the posterior actually mean. Gelman often argues using a distribution of true parameters in more or less similar situations, but one could also choose Bayesian priors not in the first place because they encode "prior information", but rather because they have beneficial impact on results in a frequentist sense, such as regularisation, avoiding otherwise possible degenerated or nonsense frequentist results, or improving mean squared errors. This however means that the posterior cannot be interpreted in the traditional Bayesian way (if the prior doesn't have traditional Bayesian meaning, neither has the posterior); rather there needs to be knowledge (which there often is) about the frequentist properties of the Bayesian analysis given true parameters, and the interpretation will then use this, and consequently once more one can't really say "the probability is 95% that the true parameter is between 1 and 5". (This seems to be the most popular interpretation of both confidence and credibility intervals, but it doesn't seem to be properly licensed by any interpretation of probability!)

Personally I think that in such situations a Bayesian approach is really only valuable if it can be argued that and how the prior helps, i.e., if there is some information that has the potential to improve solutions that is encoded in it, so I wouldn't do this with informationless/default priors, but Gelman probably would (although he has argued in several places that often there's more information that people think as long as they haven't tried hard enough to nail this information down).

score 2 · Answer 2 · answered Oct 17 '21 at 10:01

Well, the answer is: it depends. As @BruceET said, Bayesian models can be used to limit the possible values depending on the distribution you assume; furthermore they tend to be more reliable when evaluating rare events and data with high number of cells with 0 counts. However, when you have a huge number of data, also frequentis models tend to be equally reliable.

It also depends on the type of analysis you are performing: in network meta analysis, when trials with more than two arms are included, bayesian analysis tend to perform better than frequentist, whereas frequentist analysis tend to perform similarly when only trials with twi arms are included (however, bayesian Network also give you better estimation of probability of being the best treatment calcukated with SUCRA or rankograms).

score 2 · Answer 3 · answered Oct 17 '21 at 12:27

Bayesian and frequentist analyses are not equivalent under non-informative priors in any general sense. For one thing, Bayesian probabilities are exact and frequentist p-values and confidence intervals are approximate except under a very select choice of models. For another thing, the "equivalence" breaks down if there is more than one look at the data, i.e., sequential analysis is being done.

I can't think of a situation where there is no information about a parameter. Think about comparing two treatments for treating blood pressure. If you use a flat prior for the treatment effect, such a prior allows the treatment effect to be 100,000mmHg. We know at the very least that the treatment will not shift mean blood pressure by more than 50mmHg. We know more than that in many situations, e.g., that the probability of more than a 10mmHg shift in mean blood pressure is less than 0.025 likely.

Some resources about Bayes may be found here.

"For one thing, Bayesian probabilities are exact and frequentist p-values and confidence intervals are approximate except under a very select choice of models." - Frequentists could numerically approximate exact probabilities for most if not all situations, which is what Bayesians do applying MCMC and the like. — Christian Hennig, Oct 17 '21 at 13:19
Bayesian posterior probabilities are exact to within simulation error (which can be reduced to zero if you run long enough). Frequentists have been unable to achieve high accuracy in very non-Gaussian likelihood situations, e.g., binary logistic regression. There is no exact unconditional logistic regression in the frequentist world. — Frank Harrell, Oct 17 '21 at 13:31

score 1 · Answer 4 · answered Oct 17 '21 at 13:24

I’ll approach the answer from practical point of view, not philosophy. You almost always have priors or can construct them.

In the industry when we approach the expert we help them formulate priors. we can’t simply ask a portfolio manager to supply us with prior distribution of loan defaults. That won’t go anywhere. Instead we approach them with surveys that allow us to formulate the priors. For instance we may start with beta distribution then ask a few questions to determine its shape. Beta distribution is bounded between 0 and 1 which suites this case.

Is it really worth doing Bayesian Analysis if you have no idea about Priors?

4 Answers4