3

How do I capture a claim of ignorance about a parameter in a Bayesian analysis?

For instance, suppose I observed a binomial random variable $X\sim Bin(n, p)$. Say $X = 5$ and $n = 10$. I want to make inference about $p$.

But now further suppose I don't know anything about $p$ beyond the fact that it is between zero and one. A uniform prior is not a claim of ignorance about $p$, as discussed here What is the Wine/Water Paradox in Bayesian statistics, and what is its resolution?.

So how can I express my ignorance about $p$ and proceed with inference given the observed data?

  • 1
    The uniform prior is a non-informative prior for the proportion of successes in a Binomial. The problem in that paradox is that there is a non-linear link between variables (since it's a ratio that can go both ways, one the inverse of the other), so you can't place an uniform prior on one of its components without assuming structure in the other. – Firebug Mar 19 '21 at 19:58
  • @Firebug can you define mathematically what a "non-informative" prior is? Why does it express ignorance? I can't see how claiming that $p\sim U(0,1)$ is the same as saying "I don't know anything about $p$ except that it's between 0 and 1". –  Mar 19 '21 at 20:02
  • 2
    The only way you can model *true* ignorance is by refusing to assign any probability distribution at all to the parameter. In Ellsberg's original paper on [his paradox](https://en.wikipedia.org/wiki/Ellsberg_paradox) he pointed out an implication something like this for Bayesian analysis. The standard way around this problem is to assign a "diffuse" prior and then do *post hoc* sensitivity analyses to demonstrate the prior doesn't matter. See, for instance, the Gelman *et al.* textbook. Others abandon Bayesian approaches and use "diffuse probabilities" *etc. – whuber Mar 19 '21 at 20:08
  • @whuber, are you saying that decision theory in math is overlapping with Bayesian approach in statistics? – Good Luck Mar 19 '21 at 20:58
  • @bayesian_newbie There are several "non-informative" possibilities for the Binomial, defined from the principle of indifference. The $\mathcal U (0,1) = \mathcal \beta (1,1)$ is one of them, but other symmetrical Beta are also possible choices. See https://stats.stackexchange.com/q/297901/60613 for examples. – Firebug Mar 19 '21 at 21:04
  • 2
    De Finetti would probably argue that there is no such thing as "true ignorance", and would ask you to assign betting rates to different possible outcomes (*before* having seen the data) that respect the axioms of probability. What amount x would you pay for the chance of winning 1 unit of money (UM) if you observe 5 out of 10, if you'd then also be forced to offer 1-x for winning 1 UM if something else happens? Etc. - from this your prior can be reconstructed. (Sorry, not quite sure whether the exact betting setup is correct... it's something like this but I only have 5 minutes to edit...) – Christian Hennig Mar 19 '21 at 21:29
  • This question is pretty similar to https://stats.stackexchange.com/questions/514688/using-prior-intervals-instead-of-prior-distributions-how-should-i-update-my. – fblundun Mar 19 '21 at 21:31
  • +1 @Lewian "True ignorance" is a radical extremist position. Folks uncomfortable with uncertainty sometimes retreat to claims of not knowing anything in a kind of epistemic despair… I think those kinds of claims tend to be vastly overblown. – Alexis Mar 19 '21 at 21:37
  • @fblundun it's different because here I'm asking for the specific uniform example too –  Mar 19 '21 at 21:43
  • @Lewian can you elaborate your point further in an answer? What if I'm not willing to bet anything? –  Mar 19 '21 at 21:45
  • Addition to my earlier posting: The betting setup is correct, however you'd have to *offer* all these bets, and then an opponent can accept which one they accept. – Christian Hennig Mar 19 '21 at 21:45
  • @Lewian I think this betting thing can get us somewhere, maybe you can prove that I always must have some probabilistic prior belief revealed by my behavior? I find that hard to prove without invoking strong assumptions though. –  Mar 19 '21 at 21:48
  • @bayesian_newbie: See my (updated) answer. – Christian Hennig Mar 19 '21 at 22:11
  • 1
    @Lewian One point of Ellsberg's Paradox is that it casts doubt on the adequacy of De Finetti's position. I don't think this is a "radical extremist position" as another commenter has subsequently claimed. Regardless of one's philosophy, there is a role for the condition of "true ignorance," such as in proving theorems about the correctness of numerical algorithms on finite-precision computing machines. – whuber Mar 19 '21 at 22:55

3 Answers3

4

(Much of this answer is copied from another answer of mine here; I do so without further specific attribution of quoted or paraphrased material.)

A good way to proceed here is to conduct your analysis within the imprecise probability framework (see esp. Walley 1991, Walley 2000). In this framework the prior belief is represented by a set of probability distributions, and this leads to a corresponding set of posterior distributions. Below I will show you how this works for your specific example of binomial data.

Before getting to the implementation of this method, the first thing to note here is that the most extreme form of ignorance possible would be to take an imprecise prior distribution composed of the set of all possible priors on the parameter range. This would lead to an imprecise posterior with is the set of all possible posteriors, so your inference would then be vacuous. That is, total ignorance going in leads to total ignorance coming out. Consequently, if we want a useful inference at all, we must frame our "ignorance" in some way that restricts the imprecise prior to a reasonable range of priors from which a small set of posteriors can be formed. This can be done by allowing the prior expectation of the parameter to vary over all possible values in its range, but restricting the prior variance either to a single value or a small range to get a useful posterior inference.


Application to the binomial model: Suppose we observe data $X_1,...,X_n | \theta \sim \text{IID Bern}(\theta)$ where $\theta$ is the unknown parameter of interest. Usually we would use a beta density as the prior (both the Jeffrey's prior and reference prior are of this form). We can specify this form of prior density in terms of the prior mean $\mu$ and another parameter $\kappa > 1$ as:

$$\begin{equation} \begin{aligned} \pi_0(\theta | \mu, \kappa) = \text{Beta}(\theta | \mu, \kappa) = \text{Beta} \Big( \theta \Big| \alpha = \mu (\kappa - 1), \beta = (1-\mu) (\kappa - 1) \Big). \end{aligned} \end{equation}$$

(This form gives prior moments $\mathbb{E}(\theta) = \mu$ and $\mathbb{V}(\theta) = \mu(1-\mu) / \kappa$.) Now, in an imprecise model we could set the prior to consist of the set of all these prior distributions over all possible expected values, but with the other parameter fixed to control the precision over the range of mean values. For example, we might use the set of priors:

$$\mathscr{P}_0 \equiv \Big\{ \text{Beta}(\mu, \kappa) \Big| 0 \leqslant \mu \leqslant 1 \Big\}. \quad \quad \quad \quad \quad$$

Suppose we observe $s = \sum_{i=1}^n x_i$ positive indicators in the data. Then, using the updating rule for the Bernoulli-beta model, the corresponding posterior set is:

$$\mathscr{P}_\mathbf{x} = \Big\{ \text{Beta}\Big( \tfrac{s + \mu(\kappa-1)}{n + \kappa -1}, n+\kappa \Big) \Big| 0 \leqslant \mu \leqslant 1 \Big\}.$$

The range of possible values for the posterior expectation is:

$$\frac{s}{n + \kappa-1} \leqslant \mathbb{E}(\theta | \mathbb{x}) \leqslant \frac{s + \kappa-1}{n + \kappa-1}.$$

What is important here is that even though we started with a model that was "uninformative" with respect to the expected value of the parameter (the prior expectation ranged over all possible values), we nonetheless end up with posterior inferences that are informative with respect to the posterior expectation of the parameter (they now range over a narrower set of values). As $n \rightarrow \infty$ this range of values is squeezed down to a single point, which is the true value of $\theta$.

Ben
  • 91,027
  • 3
  • 150
  • 376
  • This is very interesting (+1). –  Mar 19 '21 at 22:15
  • Is there any relation to this answer, using Bayes Linear (BL) Statistics: https://stats.stackexchange.com/questions/514688/using-prior-intervals-instead-of-prior-distributions-how-should-i-update-my –  Mar 19 '21 at 22:16
  • "total ignorance going in leads to total ignorance coming out." There seems to be some flaw in this logic though. Suppose I have data going to infinity $n \to \infty$. A frequentist would easily make a confident claim here, while still being totally agnostic *a priori* about the true parameter value. But your sentence suggests that a Bayesian cannot be totally agnostic, otherwise she would conclude nothing even after seeing infinite data? –  Mar 19 '21 at 22:25
  • This is nicely explained, however it disappoints me that the people who came up with this tend to offer this as some kind of "solution" to the problem that Bayesian priors are not really "fully uninformative" - just to find out that also in this approach a "totally informationless" initialisation doesn't get you anywhere either. – Christian Hennig Mar 19 '21 at 22:27
  • When you allow sets of priors, the asymptotic case becomes a bit subtle. For any finite $n \in \mathbb{N}$ (no matter how large), you can choose a value $\kappa$ that is sufficiently large that the prior dominates the data. Even if you take $n \rightarrow \infty$ *for any fixed prior* then you get an informative result. However, the set of all possible priors will still contain *sequences of priors* that dominate the data at every finite value, so the limit can still be constructed in such a way that the data is non-informative (or completely informative). – Ben Mar 19 '21 at 22:55
1

According to Bruno de Finetti, one of the major proponents of subjective Bayes, prior probabilities can be "elicited". This is a constructive act. De Finetti would hold that there is always some kind of belief about the future that can be expressed by prior probabilities. One way to do this is as follows: Before observing the data, imagine the following "game": You are forced to offer bets of x Kj (Kujambel, let's say that's the unit of money used here) on all kinds of possible outcomes for games in which you win 1 Kj in case that the outcome (or set of outcomes) occurs. The catch is that if you offer x Kj for winning 1 Kj for event $A$, your betting opponent can choose to either accept this, or to take 1-x for paying you 1 Kj in case that $A^c$ occurs. So you have an incentive to choose x not too high (because you may lose it if $A$ doesn't happen), but also not too low (because the opponent may then just take 1-x for $A^c$ rather than x for $A$).

Your prior can then be "reverse-engineered" from your bets. Actually this is not totally true - if you have $n=10$ and you assume exchangeable Bernoulli experiments (i.e., independent given $p$), there are not enough observable events to fully reconstruct the prior and you'd need in principle think about the case $n\to\infty$. However you could play this as a thought experiment imagining that at some point the true value of $p$ is revealed to you (as limit of outcomes of infinitely many Bernoulli experiments, say); the prior over $p$ translates into offered bets for all kinds of subsets of $[0,1]$ then. A probably sensible way to go on about this is to look at a number of supposedly "informationless" priors suggested in the literature, work out the resulting probabilities for all kinds of events of interest for $n=10$, and decide (all before having seen the data) which set of bets looks right for you.

"What if I'm not willing to bet anything?" - Well, it's a thought experiment, but if you refuse it anyway, de Finetti will not help you. Tough luck.

"I think this betting thing can get us somewhere, maybe you can prove that I always must have some probabilistic prior belief revealed by my behavior? I find that hard to prove without invoking strong assumptions though." This cannot be proved, rather one can see it as an implicit assumption or axiom. Personally I think of this as constructive, meaning that this is a scheme that will produce a prior that you can take as "yours" if you commit yourself to going with it, without the need to assume anything about beliefs that you have without being aware of them. Your choice whether you accept this or not.

Disclaimer: I am explaining de Finetti's approach here. I'm not claiming that this is the only correct one. This approach is controversial for various reasons, and it's not my aim to defend it against any objection I can imagine. However it is a clear principled approach that gets you somewhere.

Christian Hennig
  • 10,796
  • 8
  • 35
  • You are basically saying that Bayesians are slaves of probabilities then? This seems to suggest that *all* rational thought can only be expressed as probabilities, which I doubt, honestly. –  Mar 19 '21 at 22:11
  • Well, they choose these probabilities, so you can well say that the probabilities are their slaves... – Christian Hennig Mar 19 '21 at 22:12
  • I mean, a slave of the reasoning tool. –  Mar 19 '21 at 22:13
  • Obviously, if you *choose* to be a Bayesian, you *choose* to use the Bayesian tools! – Christian Hennig Mar 19 '21 at 22:15
  • "This seems to suggest that all rational thought can only be expressed as probabilities, which I doubt, honestly." If you go by my "constructive" interpretation, no such suggestion is made. You can freely decide to express your prior knowledge/belief as probabilities in this way in one case, and decide that you don't want to do that because it's inappropriate in another. I certainly do not claim that everything should be analysed in a Bayesian manner, let alone a subjectivist Bayesian manner. – Christian Hennig Mar 19 '21 at 22:20
-2

Let's say we have parameters $\theta$ and some latent variable $z$. The product of all conditional dependencies is how we calculate the impact of all parameters.

$q\left(z \mid \theta_{\mathcal{O}}\right) \propto p(z) \prod_{k \in \mathcal{O}} q\left(z \mid \theta_{k}\right)$

If we would like to ignore the parameter $i$, then we just set $q(z \mid \theta_i)=1$

Good Luck
  • 293
  • 15