4

say, I have a prior distribution of parameter $\pi(\theta)$

Then, given observation $x_1,x_2,...x_n=x$, we have $\pi(\theta \mid x) \propto f(x \mid \theta) \pi(\theta)$, which is then used for various of forecasting.

Let's say, I don't trust the data and would prefer to rely more on the prior distribution, are there ways to do so?

For example, one proposed method by me is to modify $\pi(\theta \mid x) = f(x \mid \theta) \pi(\theta)^a$ for some constant $a$. However, I am not sure if this is a rigorous way.

Are there good ways to achieve what I want?

Dimitris Rizopoulos
  • 17,519
  • 2
  • 16
  • 37
Preston Lui
  • 509
  • 2
  • 8
  • 2
    @tim has already given a very good answer but I want to know why you're using the data if you don't trust it? As a side note: An alternative approach (which doesn't seem to be used much in stats) is direct elicitation: you look at the data and take your own beliefs into account and then elicit a probability distribution from this. However, you *should not* use this prior with the same data in the likelihood. – jcken Jun 09 '20 at 19:44

2 Answers2

8

You don't need hacks, this can be tackled using vanilla Bayes theorem. The more informative your prior is, the more weight in has on the final result. The opposite is also true, the more information your data provides, so also the larger sample size, the more weight it has. So just have your priors to be more informative. To achieve this, you need priors that concentrate more probability mass over the outcomes you assumed to be most likely a priori, e.g. decrease variance of normal prior distribution. Stronger prior would need much more evidence to be "convinced" by the data, so your "untrustworthy" data would need to be very strong. If your data overcomes the strong, informative prior, maybe it’s not that bad after all?

Below you can find an example of beta-binomial model, where for the same data, and same model, we use priors that are centred on same mean for $p$, but differ in how informative they are (from least, to most). As you can see, the more informative the prior, the more influence it has on posterior (the closer it ends to prior).

enter image description here

Disclaimer: if you try reproducing plots as above, notice that I re-scaled likelihood, so that it roughly matches the scale of prior and posterior. This does not matter for the results, but makes the plots more readable.

With introducing a weight for prior, as you did in your question, or data, as suggested in comments and other answer, you'd introduce additional hyperparameter that needs to be tuned. Moreover, this would make interpretability of the results harder, since the impact of the prior would get much less trackable. With vanilla Bayes theorem, there is no such problem, where the quantities that go into equation are well-defined and the outcome is a proper probability.

It is also worth asking yourself why do you consider this data as "untrustworthy"? I assumed in here that what you meant is that is is "noisy", so you want to catch up by forcing some assumptions by the priors. Another case is that the data is simply wrong, but then why would you use it at all? If you would need to hack your model to ignore the data, then this isn't statistics anymore. Another case is if the uncertainties are known for the datapoints, then you can always use sample weights, or error-in-variable model.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • 1
    I don’t agree with this. The issue is that we for some reason don’t "trust the data" (the likelihood). The prior should represent prior beliefs so it’s hard to see that you should somehow be allowed to change it if some assumption underlying the likelihood happen to be violated. – Jarle Tufto Jun 09 '20 at 19:47
  • @JarleTufto it sounds as "not trusting the data" would justify using some arbitrary approach. How would you quantify "not trusting the data"? If you'd like to downweight the likelihood by a factor of $\alpha$, it's the same as making the prior $\alpha$ times stronger, they are sides of the same coin. – Tim Jun 09 '20 at 19:58
  • 1
    No, down-weighing the likelihood is clearly very different (in terms of the final posterior) from making the prior more informative! I agree, however, that what I suggest is also ad-hoc. But perhaps it is a reasonable approximation in cases such as non-independent data. – Jarle Tufto Jun 09 '20 at 20:06
  • @JarleTufto as about "prior beliefs" and prior being independent of likelihood, I highly recommend you the paper by three respected Bayesian statisticians https://arxiv.org/abs/1708.07487 – Tim Jun 09 '20 at 20:20
  • 1
    Agree with @Jarle, in fact tons of people weight the likelihood for data they don’t trust, particularly when they want to combine inferences from different data sources (this is a common strategy in the data fusion, meta analysis, and theoretical Bayesian statistics literature). It’s also pretty clearly different since up-weighting the prior makes it more informative, and hence would tighten the inferences, while downweighting the likelihood makes things more diffuse. – guy Jun 09 '20 at 20:27
  • @guy combining data from different sources vs giving different "weights" to likelihood and prior are very different things. Also it is rather weighting datapints (last paragraph of the answer) rather the whole likelihood. Weighting likelihood or prior is arbitrary, while using strong, informative priors is not. – Tim Jun 09 '20 at 21:43
  • 1
    @Tim I'm just giving you examples of things that people actually do. A substantial amount has been written about why you might want to temper the likelihood, so I would caution against dismissing it outright. You are free to argue the merits of this approach, but evidently a lot of people think it is a reasonable thing to do. The weight itself isn't too arbitrary, since you can scale things by the sample size; for example, if you have some untrustworthy data that you think should be worth five "units" of observation because it is unreliable, you can weight by 5/N. – guy Jun 10 '20 at 00:38
  • 1
    @Tim and, again, making the prior more informative won't have the same effect, because this *increases the net information* which leads to a tighter posterior on average. Downweighting the likelihood *decreases the net information* and hence will result in a flatter posterior. – guy Jun 10 '20 at 00:44
  • @guy I’m just saying that it’s a bit arbitrary and needs tuning, see last part of my answer. Also, I’m *not* saying that you should tune your prior, but that with strong enough prior, you’d need strong data to overcome it. Edited wording for clarity. – Tim Jun 10 '20 at 06:29
  • 1
    I agree with Jarle, we should not change prior believes. Prior believes are what they are... prior believes (and the fact that the data is not trustworthy should not change our prior beliefs). When our data is not trustworthy then this should be represented by a less sharp likelihood function, and as a result the posterior will resemble relatively more the prior. In the case of this answer this doesn't happen. The prior is made more sharp, and that is not correct (to where do you concentrate the prior?). It does not give more weight to the prior but instead it completely changes the prior. – Sextus Empiricus Jul 02 '20 at 15:44
  • In short, untrustworthy data should not result in a more concentrated posterior. – Sextus Empiricus Jul 02 '20 at 15:49
  • @SextusEmpiricus the posterior would not be more concentrated then the prior. What the answer says is that if you believe your prior beliefs, then choose strong, informative prior over them, so that untrustworthy data would be having hard time overcoming the prior. It *doesn't* suggest to hack the prior until finding the result that the researcher likes. – Tim Jul 02 '20 at 16:07
  • @Tim this is very confusing. What do you mean by *choose* a strong informative prior. In your answer you mention an example how this can be done *"e.g. decrease variance of normal prior distribution"*. But *how* can that be done. Say I have an non-thrustworthy IQ test. Normally my prior is N(mu=100,SD=15). You suggest to *choose* a prior with lower standard deviation, it in other words *add* fake information that the value is closer to the mean 100? – Sextus Empiricus Jul 02 '20 at 16:47
  • Maybe I'm missing something, but if the OP just renormalizes after raising the prior to alpha, doesn't that make the prior a proper distribution again, which is then more or less informative, depending on whether alpha is more or less than 1? – Peter Feb 20 '21 at 11:02
4

If you really don’t trust your likelihood, for example if your observations in reality violate an assumption of independence, you could down-weigh the likelihood. For example, you could raise the likelihood to a power of 1/2, which in effect would reduce the sample size by a factor of two. If instead up-weighing the prior as you propose, this would clearly lead to inconsistencies, for example in cases where the likelihood happen to be completely flat, your posterior would in general still be different from the prior which would not make sense.

Jarle Tufto
  • 7,989
  • 1
  • 20
  • 36
  • Erm..., what would then be the "objective" rule in selecting the power? For instance, if the tails of the hypothetical models are wrong, a power down-weighting does not necessarily help. – Xi'an Oct 28 '20 at 10:14
  • @Xi'an Yes, I fully agree that this would work for all violations of the model assumptions. – Jarle Tufto Oct 29 '20 at 12:14