8

I have been looking at an MCMC data augmentation question; the general form of the question is as follows:

Suppose data gathered on a process suggests $X_{i} \sim \text{Pois}(\lambda)$ and a prior for the rate parameter is suggested as $\lambda \sim \text{Exp}(\lambda_{0})$. The data is recorded and presented in a typical form (i.e. the number of occurrences of each value for $X_{i}$ from $0$ to $n$), however, the data gathered does not discriminate in instances where $X_{i} \leq 1$ (i.e. all occurrences where $X_{i} = 0$ and $X_{i} = 1$ are grouped into one category).

Given the data, the likelihood and the prior described above, the question asks for:

  • The posterior form of $\lambda$,

  • The number of occurrences where $X_{i} = 0$.

I'm not really sure how to answer this question, but I am aware that Gibbs Sampling can be used in data augmentation. Does anybody have any information on how this could be done?

EDIT:

I should specify that it's primarily the second part (the number of occurrences where $X_{i} = 0$) that I'm unsure about. For the first part (the posterior form of $\lambda$), given the likelihood and the prior suggested, I have reasoned (although I'm happy to be corrected):

Given:

$$ \pi(\lambda|\vec{x}) \propto p(\vec{x}|\lambda) \times p(\lambda) $$

So, for the model given above:

$$ \pi(\lambda|\vec{x}) = \frac{\lambda^{\sum_{i=1}^{n}x_{i}}}{\sum_{i=1}^{n}x_{i}!}e^{-n\lambda} \times \lambda_{0}e^{-\lambda \lambda_{0}} $$

Simplifying yields:

$$ \pi(\lambda|\vec{x}) = \frac{\lambda^{\sum_{i=1}^{n}x_{i}}}{\sum_{i=1}^{n}x_{i}!}e^{-\lambda(n+\lambda_{0})}\lambda_{0} $$

which is proportional to (and hence the posterior form is given by):

$$ \pi(\lambda|\vec{x}) \propto \lambda^{\sum_{i=1}^{n}x_{i}}e^{-\lambda(n+\lambda_{0})}\lambda_{0} $$

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
user9171
  • 1,321
  • 3
  • 14
  • 24

1 Answers1

5

Your answer does not account for the fact that the observations equal to zero and to one are merged together: what you computed is the posterior for the complete Poisson data, $(X_1,\ldots,X_n)$, rather than for the aggregated or merged data, $(X_1^*,\ldots,X^*_n)$.

If we take the convention that cases when the observation $X_i^*=1$ correspond to $X_i=1$ or $X_i=0$ and the observation $X_i^*=k>1$ to $X_i=k$, the density of the observed vector $(X_1^*,\ldots,X^*_n)$ is (after some algebra and factorisation) $$ \pi(\lambda|x_1^*,\ldots,x^*_n) \propto \lambda^{\sum_{i=1}^n x_i^*\mathbb{I}(x_i^*>1)} \exp\{-\lambda(\lambda_0+n)\} \times \{1+\lambda\}^{n_1} $$ where $n_1$ is the number of times the $x_i^*$'s are equal to one. The last term between brackets above is the probability to get 0 or 1 in a Poisson draw.

So this is your true/observed posterior. From there, you can implement a Gibbs sampler by

  1. Generating the "missing observations" given $\lambda$ and the observations, namely simulating $p(x_i|\lambda,x_i^*=1)$, which is given by $$ \mathbb{P}(x_i=0|\lambda,x_i^*=1)=1-\mathbb{P}(x_i=1|\lambda,x_i^*=1)=\dfrac{1}{1+\lambda}\,. $$
  2. Generating $\lambda$ given the "completed data", which amounts to $$ \lambda|x_1,\ldots,x_n \sim \mathcal{G}(\sum_i x_i + 1,n+\lambda_0) $$ as you already computed.

(If you want more details, Example 9.7, p.346, in my Monte Carlo Statistical Methods book with George Casella covers exactly this setting.)

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • Firstly, thank you very much for your reply. I just have a few short questions. – user9171 Nov 13 '12 at 22:29
  • (1)How do we get $\mathbb{P}(x_{i}=0|\lambda,x_{i}^{*}=1)=\frac{1}{1+\lambda}$? – user9171 Nov 13 '12 at 22:30
  • (2)For this example, to begin the Gibbs sampler, we will require initial values for $\lambda$ and $\lambda_{0}$; can we choose any values or is there any protocol we should follow to select initial values? – user9171 Nov 13 '12 at 22:30
  • (3)When we're sampling from the $\text{Ga}(\sum^{i}x_{i}+1,n+\lambda_{0})$ posterior, we will require the sum of instances for each value (including the number of instances where $X_{i} = 1$). Obviously, this value isn't readily available. How should we compute it? Given that we will compute an estimate for the number of instances where $X_{i} = 0$ on each iteration of the sampler, can we simply use: $$n_{1}=n_{0,1}-n_{0} $$ where $n_{0,1}$, $n_{0}$ and $n_{1}$ are the number of instances where $X_{i}=1\text{ or }0$, $X_{i}=0$ and $X_{i}=1$ respectively. – user9171 Nov 13 '12 at 22:30
  • 2
    (2) Any MCMC algorithm can start with arbitrary values because the Markov chain is recurrent, this is the core idea behind Markov chain Monte Carlo methods. Note that $\lambda_0$ is a parameter of the prior: it is chosen a priori and does not change once the data is observed. – Xi'an Nov 14 '12 at 09:42
  • 2
    (3) When sampling from the Gamma distribution in step 2 of the Gibbs sampler, note that I condition upon the complete data, generated on step 1 of the Gibbs sampler. I thus "know" every value of the $x_i$'s, even those for which $x_i^*=1$. Please try to understand the distinction between the $x_i$'s and the $x_i^*$'s, this is the fundamental idea behind the data augmentation principle. – Xi'an Nov 14 '12 at 09:45
  • I know that I should have grasped the concept by now, and I'm sure this is trying your patience, but I'm still unsure about a few things. – user9171 Nov 15 '12 at 14:54
  • (1) Why is $\{\lambda+1\}$ raised to $n_{1}$? – user9171 Nov 15 '12 at 14:55
  • (2) I'm really not sure how the following is achieved. $\mathbb{P} (x_{i}=0| \lambda, x_{i}^{*}=1) = \frac{1}{1+\lambda}$ – user9171 Nov 15 '12 at 14:55
  • (3) I understand that the $x_{i}^{*}$'s are the observed data whilst the $x_{i}$'s are for the complete data, but how can we sum across ($\sum_{i}x_{i}$) if we don't know the full extent of the data (i.e. how many of those $x_{i}$'s are equal to $0$ and how many are equal to $1$)? – user9171 Nov 15 '12 at 14:55
  • 2
    (1) The $[\{\lambda+1\}\exp(-\lambda)]^{n_1}$ part corresponds to the grouped observations. – Xi'an Nov 16 '12 at 20:33
  • 2
    (2) This is conditional probability (please try to do the math by yourself): $\mathbb{P}(x_i=0|\lambda,x^∗_i=1)=\mathbb{P}(x_i=0,x^∗_i=1|\lambda)/\mathbb{P}(x^∗_i=1|\lambda)=\mathbb{P}(x_i=0|\lambda)/\mathbb{P}(x^∗_i=1|\lambda)$ – Xi'an Nov 16 '12 at 20:37
  • 2
    (3) Gibbs sampling works by conditionals. So on step 2, we condition on the $x_i$'s we simulated in step 1 (and in step 1 on the $\lambda$ we simulated in step 2). This means **those $x_i$'s are known** (even though they will change at the next iteration) and so is the sum. You definitely need to read some introduction to Gibbs if this fundamental point remains unclear to you... – Xi'an Nov 16 '12 at 20:40
  • Oh, sorry about that. Although I had recognized the distinction between the $x_{i}$'s and $x_{i}*$'s, I had been reading the statements incorrectly. I believe what follows is correct. $$ \mathbb{P}(x_{i} = 0|\lambda, x_{i}*=1) = \frac{e^{-\lambda}}{e^{-\lambda}\{1+\lambda\}} = \frac{1} {1+\lambda} $$ Hence, $ f_{0} = \left(\frac{1}{1+\lambda}\right) \times n_{1} $ and $ f_{1} = \left(\frac{\lambda}{1+\lambda}\right) \times n_{1} $, where $f_{0}$ and $f_{1}$ are the number of occurrences of $0$ and $1$ respectively and $n_{1}$ is the number of occurrences of $0$ *or* $1$. – user9171 Nov 20 '12 at 00:10
  • By the way, thank you for all your assistance - it's been invaluable! – user9171 Nov 22 '12 at 14:24