7

I was looking at the Gibbs Sampler when I stumbled upon the following example:

Suppose $y = (y_{1}, y_{2}, \ldots, y_{n})$ are iid observations from an $N(\mu, \tau^{-1})$

Furthermore, suppose there exists prior information on $\mu$ and $\tau$ indicating that they are independent and that $\mu$ follows an $N(\mu_{0}, \sigma_{0}^{2})$ distribution while $\tau$ follows a $\text{Ga}(\alpha_{0}, \beta_{0})$ distribution.

For this example,

$$ p(\mu, \tau|y_{1}, y_{2}, \ldots, y_{n}) \propto f(y|\mu, \tau)p(\mu, \tau)$$

Also,

$$ p(\mu|\tau, y) \sim N(\frac{n\bar{y}\tau+\mu_{0}\tau_{0}}{n\tau+\tau_{0}}, (n\tau+\tau_{0})^{-1}) $$

where $\tau_{0} = (\sigma_{0}^{2})^{-1}$

and

$$ p(\tau|\mu, y) \sim \text{Ga}(\alpha_{0}+\frac{n}{2}, \beta_{0}+\frac{S_{n}}{2}) $$

where $S_{n} = \sum_{i=1}^{n}(y_{i}-\mu)^{2}$

Because it is not easy to compute directly from this distribution, the Gibbs sampler can be used provided the conditional distributions are known.

Could anybody demonstrate how (provide the derivations for) the conditional distributions ($p(\mu|\tau, y)$ and $p(\tau|\mu, y)$) given above?

EDIT:

For example, to achieve $p(\mu|\tau, y)$, is the following valid?

$$ p(\mu|\tau, y) \propto p(\tau, y|\mu)p(\mu)$$

If so, what form will $p(\tau, y|\mu)$ have?

user9171
  • 1,321
  • 3
  • 14
  • 24

1 Answers1

11

Ok, what you need to do is compute the joint posterior up to a constant, i.e. $f(y_1,...,y_n|\mu,\tau)p(\mu,\tau)$. Then to compute the conditional posterior $\pi(\mu|\mathbf{y},\tau)$ you just treat the $\tau$ terms as fixed and known, so that some of them can be cancelled out. Then you do the same thing with $\mu$ to get $\pi(\tau|\mathbf{y},\mu)$.

So what I mean is, the joint posterior is given by: \begin{equation} \pi(\mu,\tau|\mathbf{y}) \propto \tau^{\frac{n}{2}+\alpha_0-1} \exp\{-\frac{\tau}{2} \sum_{i=1}^{n}(y_i-\mu)^2-\frac{\tau_0}{2}(\mu-\mu_0)^2-\beta_0\tau\}. \end{equation}

Now, to get the conditional posterior $\pi(\tau|\mathbf{y},\mu)$ you just remove the $\mu$ terms that just multiply the expression. So we could re-write the joint posterior as: \begin{equation} \pi(\mu,\tau|\mathbf{y}) \propto \tau^{\frac{n}{2}+\alpha_0-1} \exp\{-\frac{\tau_0}{2}(\mu-\mu_0)^2\}\exp\{-\frac{\tau}{2} \sum_{i=1}^{n}(y_i-\mu)^2-\beta_0\tau\}. \end{equation}

Now, since we are treating $\mu$ as fixed and known, for the conditonal posterior $\pi(\tau|\mu,\mathbf{y})$ we can remove the first exponential term from the equation and it will still be true (owing to the $\propto$ sign rather than the equals sign). It's important that you get this: the conditional posterior says given that we know $\mu$ what is $\tau$, so $\mu$ is known (and hence fixed). So the kernel for the conditional posterior for $\tau$ becomes (after some re-arranging): \begin{equation} \pi(\tau|\mu,\mathbf{y}) \propto \tau^{\frac{n}{2}+\alpha_0-1}\exp\{-\tau(\frac{1}{2} \sum_{i=1}^{n}(y_i-\mu)^2+\beta_0)\}, \end{equation} which is the kernel for the Gamma distribution with the parameters you state.

Now, for $\pi(\mu|\tau,\mathbf{y})$ you do the same thing, but it's a bit trickier because you have to complete the square. After cancelling the relevant terms not involving $\mu$ you get: \begin{equation} \pi(\mu|\tau,\mathbf{y}) \propto \exp\{ -\frac{\tau}{2} \sum_{i=1}^{n} (y_i-\mu)^2 - \frac{\tau_0}{2}(\mu-\mu_0)^2\}. \end{equation} If you multiply out the squared brackets and take the relevant summations, then remove all terms not involving $\mu$ (as they are all just multiplying the kernel by a constant) this becomes: \begin{equation} \pi(\mu|\tau,\mathbf{y}) \propto \exp\{ -\frac{1}{2}[\mu^2(n\tau+\tau_0) - 2\mu(\tau n\bar{y} - \tau_0\mu_0)]\}. \end{equation} If you complete the square here you get the kernel for a Gaussian with the mean and variance you state.

Hope that helps.

Sam Livingstone
  • 1,385
  • 11
  • 18
  • Thank you so much! That's a superb answer. I really appreciate it. – user9171 Nov 09 '12 at 12:02
  • :-) no problem. – Sam Livingstone Nov 09 '12 at 12:17
  • Sorry, but I have just one more question. In relation to the completing the square section, is the variable we are completing with respect to $\mu$? If so, how do we get the expression in the general form: $$ a\mu^{2} + b\mu + c = 0 $$ It seems that every term will contain $\mu$ or $\mu^{2}$ which leaves no constant term. – user9171 Nov 09 '12 at 13:03
  • 1
    Yeah, so what you have is $a\mu^2 + b\mu$. But the $c$ term is constant with respect to $\mu$, so you can add or subtract it as you please without losing the validity of the expression. If you have $\exp\{ a\mu^2 + b\mu + c\} = \exp\{c\}\exp\{ a\mu^2 + b\mu\}$ then you can remove and add in the $\exp\{c\}$ as you like, since it is fixed. Does that make sense/help? – Sam Livingstone Nov 09 '12 at 13:56
  • 1
    So to clarify, if you write it in the form $a(\mu + h)^2$, you will end up with an extra constant term at the end, but that's ok because we just need something proportional to what we had. – Sam Livingstone Nov 09 '12 at 13:59
  • Oh, of course! I should have spotted that. The proportionality aspect really is responsible for quite a lot. Thanks again! – user9171 Nov 09 '12 at 14:05
  • Sorry to labour this question, but I've posted what I believe is a reasonably similar question about data augmentation. Unfortunately, it hasn't received very much attention. Perhaps you have some experience in this area. If you would like to view the question, it can be found at: http://stats.stackexchange.com/questions/43488/mcmc-and-data-augmentation Thanks! – user9171 Nov 13 '12 at 13:53
  • Hello, I'm a bit busy this evening but if it hasn't been answered by tomorrow morning I'll have a look then. No guarantees I'll know the answer but I'll do my best! – Sam Livingstone Nov 13 '12 at 18:47
  • Hi. There's no rush (or obligation to even look at it, of course). I just figured I'd mention it because you seem to be quite proficient in this area. I appreciate your reply though. Thanks! – user9171 Nov 13 '12 at 18:53
  • Apologies for dragging this post up again, but, recently, I have been required to code the above example in a computer lab. Firstly, I was required to run the sampler with **uninformative** priors on the above parameters (i.e. $\mu_{0} = 0$, $\tau_{0}^{2} = \alpha_{0} = \beta_{0} = 0.00001$). After this, I was required to run the sampler with **informative** priors on the above parameters. Without any further information, how do I determine what form the informative priors should take? Perhaps they can be inferred from the initial uninformative sampler's priors? – user9171 Nov 22 '12 at 14:36
  • 1
    Hello. For the uninformative priors you want $\alpha_0 = 1$, not $0.00001$. You can't determine what the informative priors should be the way you said (or in any way like that). In practice you would choose $\alpha_0$ and $\beta_0$ so that the shape of the prior reflected your beliefs. For simulation purposes, I think they just mean try some different values of $\alpha_0$ and $\beta_0$ that will give you a less flat prior and see how your posterior changes. – Sam Livingstone Nov 22 '12 at 17:53
  • Hello. Yes, I figured that the only to develop an informative prior would be to have some previous information about the process, but I decided to ask just in case. Most probably, it will be like you said: choosing some different values for $\alpha_{0}$ and $\beta_{0}$ with a view to observing how the posterior changes. As always, thank you very much! – user9171 Nov 22 '12 at 19:46
  • By the way, I've posted another Monte Carlo based question which hasn't received much attention. As always, there's no obligation at all to check it out, but if you feel inclined, you might be able to provide some assistance. The question can be found at: http://stats.stackexchange.com/questions/44193/deriving-priors-for-mcmc-implementation . I'm abusing this thread (and your attention) now, so I best leave it at that. Thanks again! – user9171 Nov 22 '12 at 20:05
  • Can you help me see this question https://stats.stackexchange.com/questions/498646/how-to-built-gibbs-sampler-of-mixture-bayesian-regression-in-r? Thank you. – Bob Nov 30 '20 at 06:29