9

I am quite new to sampling. I am doing Gibbs sampling for a Bayesian network. I am aware about the algorithm for the Gibbs sampling but there's one thing I am not able to understand.

For example let's assume you have 3 variables $x_1, x_2, x_3$ means you need to generate sample of the form $(x_1,x_2,x_3)$. And suppose that all three parameters can take 3 values 0,1 or 2. Now, let's take initial value of the sample as (2,1,1) and compute the $\mathbb P(x_1=0 | x_2=1,x_3=1)$, $\mathbb P(x_1=1 |x_2=1,x_3=1)$ and $\mathbb P(x_1=2 |x_2=1,x_3=1)$.

Now, I am selecting the largest probability among these 3 and assigning that value to $x_1$ and using it for the next sample. I want to ask if this method for generating model is correct or I am doing something wrong.

Silverfish
  • 20,678
  • 23
  • 92
  • 180
Manish
  • 217
  • 2
  • 7
  • 4
    This is not the idea of [Gibbs sampling](http://en.wikipedia.org/wiki/Gibbs_sampling). What you have to do is to get the conditional distributions $X_1\vert X_2, X_3$; $X_2\vert X_1, X_3$; and $X_3\vert X_1, X_2$. Once you get these, initialise the algorithm at $(x_1^0,x_2^0,x_3^0)$ and use them to simulate from $X_1\vert x_2^0,x_3^0$, $X_2\vert x_1^0,x_3^0$, and $X_3\vert x_1^0,x_2^0$ to obtain $(x_1^1,x_2^1,x_3^1)$. Repeat this step using now the sample $(x_1^1,x_2^1,x_3^1)$ and so on. With this, you will obtain a sample $(x_1^j,x_2^j,x_3^j)$, $j=1,...,N$ of the joint distribution. –  Dec 14 '12 at 23:12
  • In Gibbs sampling you do not have to calculate that probability. The only thing you need is to be able to simulate from the conditional distributions and to conduct the iterative process. –  Dec 14 '12 at 23:19
  • @Procrastinator: I didn't get you. Can you please elaborate a little? Conditional distribution will involve calculating the conditional probabilities on all other parameters. Am I right? – Manish Dec 14 '12 at 23:24
  • 2
    @Procrastinator - I'm thinking you're going to have to answer this question despite your best efforts to do it via comments :) – jbowman Dec 14 '12 at 23:25
  • 3
    @jbowman I was very tempted to answer this question but then I found this one: [Can someone explain Gibbs sampling in very simple words?](http://stats.stackexchange.com/q/10213/10525). –  Dec 14 '12 at 23:53
  • @Procrastinator - wow, that's one awesome answer! – jbowman Dec 15 '12 at 00:22

2 Answers2

14

I have posted a link to a previous great answer which is more of an intuitive nature. I am going to focus on more technical details and to provide a simple example implemented in R.

Gibbs sampling is a Markov Chain Monte Carlo (MCMC) method that is used to simulate from a random vector $(X_1,...,X_n)$. Suppose that you want to obtain a simulated sample of size $N$ from this vector and denote the $j$th simulation as $(x_1^j,...,x_n^j)$. Assume also that you know the conditional distributions $X_j\vert X_1,{j-1},X_{j+1},...,X_n$ for $j=1,...,n$ and that you can simulate from them.

The Gibbs sampler is defined as the following iterative algorithm

  1. Start at the initial value $(x_1^0,...,x_n^0)$.
  2. For $k=1,...,N$ obtain a simulation of each conditional $X_j^{k}\vert x_1^{k},...,x_{j-1}^k,x_{j+1}^{k-1},...,x_{n}^{k-1}$. For this step it is essential to be able to simulate from each univariate conditional for the model parameters of interest.

With this algorithm you will obtain a sample $\{(x_1^j,...,x_n^j)\}$ of size $N$ from the joint distribution of $(X_1,...,X_n)$. Note that there is no need to evaluate any probability.

In order to ensure that the sample is approximately independent and from the target distribution you have to ignore the first $M$ samples, for a large $M$, this is called burn-in, and also to subsample every $m$ iterations, this is called thinning.

In your particular case, you have to be able to simulate from the conditional distributions of $X_1\vert X_2,X_3$; $X_2\vert X_1,X_3$; and $X_3\vert X_1,X_2$.

A toy example

Suppose that you want to simulate from $(X_1,X_2)$ distributed as a bivariate normal with mean $(0,0)$ and covariance matrix $\begin{bmatrix}1 & \rho\\ \rho & 1\end{bmatrix}$ with $\rho=0.5$. It is known that the conditional distributions are given by

$$X_1\vert X_2 \sim \mbox{Normal}(\rho X_2,1-\rho^2),\\ X_2\vert X_1 \sim \mbox{Normal}(\rho X_1,1-\rho^2).$$

Note that these distributions are easy to simulate and then we can easily implement a Gibbs sampler. The following R code shows how to do this.

N = 26000 # Total number of iterations
rho = 0.5

simx = simy = vector()
simx[1] = simy[1] = 0 # Initialise vectors

# Gibbs sampler
for(i in 2:N){
simx[i] = rnorm(1,rho*simy[i-1],sqrt(1-rho^2))
simy[i] = rnorm(1,rho*simx[i],sqrt(1-rho^2))
}

# burnin and thinning
burnin = 1000
thinning = 25

# Gibbs sample
sim = cbind(simx[seq(from=burnin,to=N,by=thinning)],simy[seq(from=burnin,to=N,by=thinning)])

# A diagnosis tool to assess the convergence of the sampler
plot(ts(sim[,1]))
plot(ts(sim[,2]))

# Plots of the marginals and the joint distribution

hist(sim[,1])
hist(sim[,2])

plot(sim,col=1:1000)
plot(sim,type="l")

As you can see, there is no need for evaluating of the conditional densities or distributions. The only thing we need to do is to simulate from the corresponding conditionals. In your case you just need to simulate from the aforementioned conditionals and to conduct an analogous iterative algorithm.

  • In the above code where are you using the conditional distribution. Is that part of "rnorm(1,rho*simy[i-1],sqrt(1-rho^2)" function.? – Manish Dec 15 '12 at 00:41
  • @Manish For instance in the conditional $X_1\vert X_2 \sim N(\rho X_2,1-\rho^2)$. This is reflected in `simx[i] = rnorm(1,rho*simy[i-1],sqrt(1-rho^2))` at the $i$th iteration. Note that `simx[i]` is based on the previous value `simy[i-1]` as specified in the steps of the algorithm. –  Dec 15 '12 at 00:43
  • Ok, I think I am getting, given the probability distribution rnorm is selecting a random number. – Manish Dec 15 '12 at 00:43
  • In my case I have a discrete conditional distribution so I need to devise a function which can take a random number from this distribution. – Manish Dec 15 '12 at 00:51
  • @Manish Indeed, in order to use a Gibbs sampler, either in the discrete or the continuous case, you need to know the conditionals and to be able to simulate from them. Otherwise you would need to use more sophisticated versions like a "Gibbs within Metropolis" or other *ad hoc* version. Do you know these distributions? –  Dec 15 '12 at 00:54
  • Not really, the problem I am working upon is an application of the bayesian network. For this network I need to generate samples from the Gibbs distribution. I have read before about gibbs sampling but always missed that point. – Manish Dec 15 '12 at 00:59
  • @Manish If you are working with a bayesian network and it seems your nodes take discrete values, then your conditional distribution is simply multinomial distribution isn't it? – TenaliRaman Dec 15 '12 at 04:01
  • @TenaliRaman: Exactly.. – Manish Dec 15 '12 at 06:18
  • @Manish Then you probably want to look at this - http://en.wikipedia.org/wiki/Multinomial_distribution#Sampling_from_a_multinomial_distribution – TenaliRaman Dec 15 '12 at 09:50
  • Nice answer @Procrastinator (+1), I like answers with example code! – Elvis Dec 15 '12 at 13:51
6

In your simple example, once you've computed $\mathbb P(x_1=0 | x_2=1,x_3=1)=p_{0,11}$, $\mathbb P(x_1=1 |x_2=1,x_3=1)=p_{1,11}$ and $\mathbb P(x_1=2 |x_2=1,x_3=1)=p_{2,11}$, you draw $x_1=0$ with probability $p_{0,11}$, $x_1=1$ with probability $p_{1,11}$, and $x_1=2$ with probability $p_{2,11}$. You can operationalize this as

  1. Draw $U\sim U[0,1]$
  2. If $U \le p_{0,11}$, assign $x_1=0$, and you are done.
  3. If $p_{0,11} < U \le p_{0,11} + p_{1,11}$, assign $x_1=1$, and you are done.
  4. If $ p_{0,11} + p_{1,11}<U$, assign $x_1=2$.

It is easy to see that thus generated $x_1$ has the required distribution.

If you drew the modal value, you will quickly get stuck in the local mode of the joint distribution, which is not what you want.

fin-cos
  • 3
  • 2
StasK
  • 29,235
  • 2
  • 80
  • 165