Putting a prior on the concentration parameter in a Dirichlet process

Question

Most of this is background, skip to the end if you already know enough about Dirichlet process mixtures. Suppose I am modeling some data as coming from a mixture of Dirichlet processes, i.e. let $F \sim \mathcal D(\alpha H)$ and conditional on $F$ assume $$Y_i \stackrel {iid}{\sim} \int f(y | \theta) F(d\theta).$$

Here $\alpha > 0$ and $\alpha H$ is the prior base measure. It turns out that if for each observation $Y_i$, if I know the associated latent $\theta_i$, the likelihood of $\alpha$ in this model is $$L(\alpha | t) \propto \frac{\alpha^t\Gamma(\alpha)}{\Gamma(\alpha + n)}$$ where $t$ is the number of distinct values of $\theta_i$ (the random measure $F$ is discrete almost surely). Escobar and West develop the following scheme for sampling $\alpha$ using a Gamma prior; first, they write $$ \pi(\alpha | t) \propto \pi(\alpha) \frac{\alpha^t\Gamma(\alpha)}{\Gamma(\alpha + n)} \propto \pi(\alpha)\alpha^{t - 1}(\alpha + n){B(\alpha + 1, n)} \\= \pi(\alpha)\alpha^{t - 1} (\alpha + n) \int_0^1 x^\alpha(1 - x)^{n - 1} \ dx,$$ where $B(\cdot, \cdot)$ is the beta function. Then then note that if we introduce a latent parameter $X \sim \mbox{Beta}(\alpha + 1, n )$ then the likelihood has the form of a mixture of Gamma distributions and use this to write down a Gibbs sampler.

Now my question. Why can't we just write $$ L(\alpha | t) \propto \frac{\alpha^t \Gamma(\alpha)}{\Gamma(\alpha + n)} = \frac{\alpha^t \Gamma(n)\Gamma(\alpha)}{\Gamma(\alpha + n)\Gamma(n)} = \alpha^t B(\alpha, n) \Gamma(n) \\ \propto \alpha^t \int_0 ^ 1 x^{\alpha - 1} (1 - x)^{n - 1} \ dx, $$ and instead of using a mixture of Gamma distributions use a single Gamma distribution? If we introduce $X \sim \mbox{Beta}(\alpha, n)$ shouldn't I be able to do the same thing but without needing to use the mixture?

Edit for more details More Details: To fill in some gaps, the argument in Escobar and West is that, letting $\alpha$ have a Gamma distribution with shape $a$ and mean $a / b$, $$\pi(\alpha | t) \propto \alpha^{a + t - 2} (\alpha + n) e^{-b\alpha} \int_0 ^ 1 x^{\alpha} (1 - x)^{n - 1} \ dx$$ and so we can introduce a latent $X$ so that $$\pi(\alpha, x | t) \propto \alpha^{a + t - 2} (\alpha + n) e^{-b\alpha}x^{\alpha}(1 - x)^{n - 1}.$$ The full conditionals are a $\mbox{Beta}(\alpha + 1, n)$ distribution for $X$ and a mixture of a $\mathcal G(a + t, b - \log(x))$ and a $\mathcal G(a + t - 1, b - \log(x))$ for $\alpha$.

By the same argument, I got the same result but with $\mbox{Beta}(\alpha, n)$ for $X$ and $\mathcal G(a + t, b - \log(x))$ for $\alpha$. This seems easier to me; why don't they just do that?

score 4 · Answer 1 · answered May 30 '13 at 21:05

I don't see how what you've written is fundamentally different from Escobar and West.

\begin{eqnarray*} \pi(\alpha|t) &\propto& \pi(\alpha)\pi(t|\alpha) = \pi(\alpha)L(\alpha|t) \\ &\propto& \pi(\alpha)\alpha^t\frac{\Gamma(\alpha)}{\Gamma(\alpha+n)} \\ &\propto& \pi(\alpha)\alpha^t\frac{\Gamma(\alpha)\Gamma(n)}{\Gamma(\alpha+n)} \\ &=& \pi(\alpha)\alpha^tB(\alpha,n) \\ &=& \pi(\alpha)\alpha^{t-1}(\alpha+n)B(\alpha+1,n) \end{eqnarray*} where the second to last line is how you have it and the last line is how E&W have it and they are equal since \begin{eqnarray*} \alpha B(\alpha,n) &=& \alpha \frac{\Gamma(\alpha)\Gamma(n)}{\Gamma(\alpha + n)} = \frac{(\alpha\Gamma(\alpha))\Gamma(n)(\alpha+n)}{(\Gamma(\alpha + n)(\alpha+n))} = (\alpha+n) \frac{\Gamma(\alpha + 1)\Gamma(n)}{\Gamma(\alpha + n + 1)} \\ &=& (\alpha+n)B(\alpha+1,n) \end{eqnarray*} recalling that $\Gamma(z+1) = z\Gamma(z)$.

I'm guessing that they prefered their formulation over yours because it only has the Beta function term, not the product of a Beta and a Gamma, but I could be wrong. I didn't quite follow the last bit you've written, could you be more explicit about your sampling scheme?

Putting a prior on the concentration parameter in a Dirichlet process

1 Answers1