Does the beta distribution have a conjugate prior?

Question

I know that the beta distribution is conjugate to the binomial. But what is the conjugate prior of the beta? Thank you.

If one works with one parameter fixed, say Beta$(\theta,1)$, then a conjugate prior for $\theta$ is a Gamma distribution. — StubbornAtom, Apr 14 '20 at 18:45
Also see the question [Random number generation for conjugate distribution of beta distribution](https://stats.stackexchange.com/questions/485170/random-number-generation-for-conjugate-distribution-of-beta-distribution). — Mankka, Nov 05 '20 at 14:00

guy · Answer 1 · 2013-08-17T23:19:28.520

30

Yes, it has a conjugate prior in the exponential family. Consider the three parameter family $$ \pi(\alpha, \beta \mid a, b, p) \propto \left\{\frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}\right\}^p \exp\left(a\alpha + b\beta \right). $$ For some values of $(a, b, p)$ this is integrable, although I haven't quite figured out which (I believe $p \ge 0$ and $a < 0, b < 0$ should work - $p = 0$ corresponds to independent exponential distributions so that definitely works, and the conjugate update involves incrementing $p$ so this suggest $p > 0$ works as well).

The problem, and at least part of the reason no one uses it, is that $$ \int_0^\infty \int_0^\infty \left\{\frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}\right\}^p \exp\left(a\alpha + b\beta \right) = ? $$ i.e. the normalizing constant doesn't have a cloed form.

edited Aug 17 '13 at 23:19

answered Aug 15 '13 at 18:55

guy

7,737
1
26
50

Ah. That is problematic. I was going to look for an uninformative version of the conjugate prior anyway, so looks like I might as well start with uniform priors over the two parameters. Thanks. – Brash Equilibrium Aug 15 '13 at 20:42
1

You don't need to normalize it if you're just comparing likelihoods… – Neil G Aug 17 '13 at 23:26
I think you might be missing the action of $p$ in your $\exp$ term as well. It should probably be $pa\alpha$, etc. – Neil G Aug 17 '13 at 23:38
@NeilG $p$ is in the $\exp$, you just have to express things in terms of $\log \Gamma(\cdots)$ instead of $\Gamma(\cdots)$. Doing $pa\alpha$ is just a reparmetrization, it changes nothing. Not sure what you mean "just comparing likelihoods". You can't implement a Gibbs sampler with this prior without using something like Metropolis, which kills the advantage of conditional conjugacy, the normalizing constant depends on $a$ and $b$ which kills putting a prior on them or estimating them by likelihood methods, etc... – guy Aug 17 '13 at 23:44
You're right that putting $p$ there is a reparameterization. However, should that be a triple integral, since you want to integrate over all three parameters… and $a$ and $b$ as you have them should be negative so two of the integrals should run over the negative reals. – Neil G Aug 17 '13 at 23:47
2

@NeilG integral is over $\alpha$ and $\beta$ since those are the random variables. – guy Aug 18 '13 at 01:59
Ah, you're right! Not sure what I was thinking… – Neil G Aug 18 '13 at 02:15
Does this type of prior assume $\alpha$ and $\beta$ are independent? What if they are not? – Jon May 28 '17 at 21:38

score 29 · Accepted Answer · edited Apr 13 '17 at 12:44

29

It seems that you already gave up on conjugacy. Just for the record, one thing that I've seen people doing (but don't remember exactly where, sorry) is a reparameterization like this. If $X_1,\dots,X_n$ are conditionally iid, given $\alpha,\beta$, such that $X_i\mid\alpha,\beta\sim\mathrm{Beta}(\alpha,\beta)$, remember that $$ \mathbb{E}[X_i\mid\alpha,\beta]=\frac{\alpha}{\alpha+\beta} =: \mu $$ and $$ \mathbb{Var}[X_i\mid\alpha,\beta] = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} =: \sigma^2 \, . $$ Hence, you may reparameterize the likelihood in terms of $\mu$ and $\sigma^2$ and use as a prior $$ \sigma^2\mid\mu \sim \mathrm{U}[0,\mu(1-\mu)] \qquad \qquad \mu\sim\mathrm{U}[0,1] \, . $$ Now you're ready to compute the posterior and explore it by your favorite computational method.

edited Apr 13 '17 at 12:44

Community

1

answered Aug 18 '13 at 22:23

Zen

21,786
3
72
114

5

No, not MCMC this thing! Quadrature this thing! only 2 parameters - quadrature is the "gold standard" for small dimensional posteriors, both for time and accuracy. – probabilityislogic Aug 18 '13 at 22:47
4

Another option is to regard $\psi = \alpha + \beta$ as a measure of precision, and again use $\mu = \frac{\alpha}{\alpha + \beta}$ as an the mean. This is done all the time with Dirichlet processes, and the beta distribution is a special case. So maybe toss a gamma or log-normal prior on $\psi$ and uniform on $\mu$. – guy Aug 18 '13 at 23:34
3

To be sure, this isn't conjugate, correct? – guy Aug 19 '13 at 16:23
3

Definitely not! – Zen Aug 19 '13 at 16:53
Hi @Zen i'm dealing with this problem right now, but i'm new at Bayesian and im not sure if im understanding the idea. I figured out that you're proposing to find $\int_{0}^{1}{\frac{1}{\mu(1-\mu}}d\mu$ and then use reparametrization, but of course that Was not the idea. Can you please help me to understand? – Red Noise Mar 12 '17 at 15:54
Another option is to parameterize by "concentration", $c = \sqrt{\alpha^2 + \beta^2}$, which scales independent of the mean. [Velten et al., 2015](http://msb.embopress.org/content/11/6/812) use this with uniform prior on the mean and log-normal prior on $c$. Details are in [their supplement (PDF)](http://msb.embopress.org/content/msb/11/6/812/DC1/embed/inline-supplementary-material-1.pdf?download=true). – merv Jul 06 '18 at 22:30
@Zen can you clarify what is the purpose of this reparameterization? How is this more helpful than working in the original parameterization? – Ceph Nov 02 '18 at 19:34
@probabilityislogic Does quadrature mean numerical integration, or something different (or perhaps something more specific)? – ashman May 15 '21 at 21:14

score 10 · Answer 3 · answered Aug 15 '13 at 09:38

In theory there should be a conjugate prior for the beta distribution. This is because

the beta distribution is one of the exponential family distributions, and
in theory it should be possible to derive a prior. See, e.g., wikipedia, D Blei's lecture on exponential families.

However the derivation looks difficult, and to quote A Bouchard-Cote's Exponential Families and Conjugate Priors

An important observation to make is that this recipe does not always yields a conjugate prior that is computationally tractable.

Consistent with this, there is no prior for the Beta distribution in D Fink's A Compendium of Conjugate Priors.

The derivation is not difficult — See my answer: http://mathoverflow.net/questions/63496/what-can-be-said-about-an-infinite-linear-chain-of-conjugate-prior-distributions/65203#65203 — Neil G, Aug 17 '13 at 23:27

user37239 · Answer 4 · 2015-07-20T19:14:50.813

4

Robert and Casella (RC) happen to describe the family of conjugate priors of the beta distribution in Example 3.6 (p 71 - 75) of their book, Introducing Monte Carlo Methods in R, Springer, 2010. However, they quote the result without citing a source.

Added in response to gung's request for details. RC state that for distribution $B(\alpha, \beta)$, the conjugate prior is "... of the form

$$ \pi(\alpha,\beta) \propto \Big\{ \frac{\Gamma(\alpha+\beta)} {\Gamma(\alpha)\Gamma(\beta)} \Big\} ^{\lambda} x_0^{\alpha} y_0^{\beta} $$

where $\{\lambda, x_0, y_0\}$ are hyperparameters, since the posterior is then equal to

$$ \pi(\alpha,\beta \vert x) \propto \Big\{ \frac{\Gamma(\alpha+\beta)} {\Gamma(\alpha)\Gamma(\beta)} \Big\} ^{\lambda} (xx_0)^{\alpha} ((1-x)y_0)^{\beta}." $$

The remainder of the example concerns importance sampling from $\pi(\alpha,\beta \vert x)$ in order to compute the marginal likelihood of $x$.

edited Jul 20 '15 at 19:14

answered Jul 20 '15 at 16:11

user37239

49
2

3

I don't have Robert's book available but the posterior is $\pi(\alpha, \beta) \propto \left( \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} \right)^{\lambda +1} (x x_0)^{\alpha-1} \left(y_0 (1-x) \right)^{\beta-1}$. Robert also posted on this topic here http://mathoverflow.net/questions/20399/conjugate-prior-of-the-dirichlet-distribution#comment553934_20875 – Fred Schoen Jan 13 '16 at 20:22
2

I humbly recommend that the original poster updates the post to indicate that the posterior given in the textbook is incorrect, per Fred Schoen's comment (which is easily verified). – RMurphy May 29 '17 at 21:04

score 2 · Answer 5 · answered Aug 15 '13 at 02:08

2

I do not believe there is a "standard" (i.e., exponential family) distribution that is the conjugate prior for the beta distribution. However, if one does exist it would have to be a bivariate distribution.

answered Aug 15 '13 at 02:08

I have no idea about this question, but I did find this handy conjugate prior map that seems to support your answer: http://www.johndcook.com/conjugate_prior_diagram.html – Justin Bozonier Aug 15 '13 at 03:10
1

The conjugate prior is in the exponential family and has three parameters — not two. – Neil G Aug 17 '13 at 23:25
1

@Neil, you are definitely right. I guess I should have said it would have to have at least two parameters. – Aug 18 '13 at 18:28
1

-1: this answer is clearly wrong in the claim that "conjugate prior does not exist in the exponential family", as is demonstrated in the [answer](https://stats.stackexchange.com/a/67511/163572) above... – Jan Kukacka Apr 24 '18 at 08:34

Does the beta distribution have a conjugate prior?

5 Answers5

Linked