Do conjugate priors just lead to a posterior that is a modification of the parameters of the prior?

Question

We know, for example, that the conjugate relationship between the classic beta-binomial is as follows:

\begin{align} y &∼ \mathcal{Bin}(n,\ θ) \\ θ &∼ \mathcal{Beta}(α,\ β) \\ θ|y &∼ \mathcal{Beta}(y + α,\ n − y + β) \end{align}

Notice how the posterior is just a change in parameterization compared to the prior. Is the posterior of conjugate priors just a change in parameters?

How do you define the concept of conjugate priors (if not this)? — Juho Kokkala, Apr 01 '16 at 11:53
Two distributions where the posterior distribution is the same as the prior distribution. With this definition, we could have `θ|y ∼ 5*Beta(y + α, n − y + β) + 5` — , Apr 01 '16 at 11:57
$Beta(y+\alpha, n-y+\beta)$ is not "the same" as $Beta(\alpha,\beta)$, so as far as I understand your comment, the classic beta-binomial would not be conjugate, either. (So I apparently don't understand what you mean by 'the same') — Juho Kokkala, Apr 01 '16 at 12:03
I suppose this is related to your other question http://stats.stackexchange.com/questions/204926 -- perhaps you could formulate a more clear question by combining these two — Juho Kokkala, Apr 01 '16 at 12:22
Please learn how to use math typesetting. http://meta.math.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference — Sycorax, Apr 01 '16 at 13:10
In usual practical terms, yes. But it's in the definition, rather than a derived property. — conjectures, Apr 01 '16 at 13:26
I am not sure that the title of the question matches with the actual question. Maybe something like "Does conjugate priors just lead to a posterior being a modification of parameters of the prior?" ? — peuhp, Apr 01 '16 at 13:57
Given that there is an upvoted & accepted answer to this Q, it isn't clear that it is too unclear to be answered. — gung - Reinstate Monica, Apr 01 '16 at 15:30
@Greenparker: I think you mean that if $X ~ Beta(\alpha, \beta)$, then $f_{X|\alpha, \beta}(x) * 5 + 5$ is not valid. As I read it, $Beta(\alpha, \beta)$ denotes a random variable, not a density function. So the conditional distribution you have written looks valid to me. — Cliff AB, Apr 01 '16 at 15:49
@JuhoKokkala I think my answer directly answers your query -- and I was surprised that there are alternative definitions of "conjugate." — Sycorax, May 26 '16 at 14:46

Sycorax · Accepted Answer · 2016-05-05T15:02:38.710

This question is actually somewhat subtle, and it brings to attention an interesting quirk of usage that I hadn't noticed before.

For every practical definition of conjugate distributions that I'm familiar with, it is the case that the posterior of a model using a conjugate prior is a modified form of the prior. The wikipedia definition follows the "practicality" (convenience) convention, for example:

In Bayesian probability theory, if the posterior distributions $p(\theta|x)$ are in the same family as the prior probability distribution $p(\theta)$, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function

However, a distinction can be found in the formal definition of conjugacy in Gelman's Bayesian Data Analysis, 3rd edition, p. 35:

If $\mathcal{F}$ is a class of sampling distributions $p(y|\theta)$ and $\mathcal{P}$ is a class of prior distributions for $\theta$, then the class $\mathcal{P}$ is conjugate for $\mathcal{F}$ if $$ p(\theta|y)\in\mathcal{P}\forall p(\cdot|\theta)\in\mathcal{F} \text{ and } p(\cdot)\in\mathcal{P}. $$ This definition is formally vague since if we choose $\mathcal{P}$ as the class of all distributions, then $\mathcal{P}$ is always conjugate no matter what class of sampling distribution is used.

Obviously the construction in the final sentence has little practical utility: if all distributions are conjugate, then the distinction between conjugate and non-conjugate distributions is trivial. Instead, it is common to take $\mathcal{P}$ to be the set of all densities having the same functional form of the likelihood, giving rise to the practical convenience properties of conjugacy, namely that the posterior is the form of the prior.

(+1) Wonder if there's a formal definition of what would be practically useful conjugacy then. Perhaps to do with having a sufficient statistic that doesn't grow with the sample size, so the posteriors don't keep getting more complicated than the priors when you update. — Scortchi - Reinstate Monica, Apr 01 '16 at 14:03
@Scortchi I wonder the same thing. I was actually surprised when I looked up the definition to answer this question. At a minimum, the usage is imprecise, but every Bayesian worth a Gibbs sampler **knows** that conjugacy is purely jargon to indicate convenience. — Sycorax, Apr 01 '16 at 14:07
Related: [Aside from the exponential family, where else can conjugate priors come from?](http://stats.stackexchange.com/q/192554/17230). (I should do what I said I was going to do & ask a question about it. Can't just be coincidence that the useful conjugates are nearly all from the exponential family, & have fixed-dimensional sufficient statistics when they're not.) — Scortchi - Reinstate Monica, Apr 01 '16 at 14:16

Do conjugate priors just lead to a posterior that is a modification of the parameters of the prior?

1 Answers1

Linked

Related