5

[EDIT]

This question comes from the example of OpenBUGS manual:

Stagnant: a changepoint problem and an illustration of how NOT to do MCMC!

I also asked another question regarding this example.

[END EDIT]

In Bayesian analysis, assume a simple linear regression model with two straight lines that meet at a certain changepoint $c$. The basic setup is as following. \begin{align*} Y_i \ & \sim \ N(\alpha + \beta_1 (x_i - c), \sigma^2), \; \text{for } x_i \leq c \; (i = 1, \ldots, k) \\ Y_i \ & \sim \ N(\alpha + \beta_2 (x_i - c), \sigma^2), \; \text{for } x_i > c \; (i = k+1, \ldots, n) \\ \end{align*} That is, observed $x$'s are ordered from smallest to largest.

The priors are assumed as: \begin{align*} \alpha \ & \sim \ N(\mu_{\alpha}, \sigma^2_{\alpha}), \quad \sigma^2 \ \sim \ IG(a, b) \\ \beta_1, \beta_2 \ & \sim \ N(\mu_{\beta}, \sigma^2_{\beta}), \quad c \ \sim \ Unif(c_1, c_2) \end{align*} where $c_1, c_2$ are within the observed range of $x$'s.

The data likelihood is then: \begin{align*} & p(\mathbf{x}, \mathbf{y}| c,\alpha, \beta_1, \beta_2, \sigma^2) = \prod_{i=1}^{k} p_1(y_i| c, .) \prod_{i=k+1}^{n} p_2(y_i|c,.) \\ & = (2 \pi \sigma^2)^{-n /2} \exp\left\{- \frac{1}{2 \sigma^2} \sum_{i=1}^k (y_i - \alpha - \beta_1 (x_i - c)) ^ 2 \right\} \\ & \quad \times \exp\left\{- \frac{1}{2 \sigma^2} \sum_{i=k+1}^n (y_i - \alpha - \beta_2 (x_i - c)) ^ 2 \right\} \end{align*}

And the prior for $c$ is $$ c \ \sim \ Unif(c_1, c_2) $$

My question is, how do I sample from the posterior distribution of $c$ in MCMC?


I can get the full conditional for $c$ as \begin{align*} & p(c|.) \propto p(\mathcal{x}, \mathcal{y}| .) p(c) \\ & \propto \exp\left\{-\frac{1}{2\sigma^2} \Big(\beta_1^2 \sum_1^k c^2 + 2c \sum_1^k \beta_1 (y_i - \alpha - \beta_1 x_i) \Big) \right\} \\ & \quad \times \exp\left\{-\frac{1}{2\sigma^2} \Big(\beta_2^2 \sum_{k+1}^n c^2 + 2 c \sum_{k+1}^n \beta_2 (y_i - \alpha - \beta_2 x_i) \Big) \right\} \times \mathbf{I}(c_1, c_2)\\ & \propto \exp\left\{-\frac{1}{2\sigma^2} \Big(c^2 (k\beta_1^2 + (n-k)\beta_2^2) + 2 c \Delta \Big) \right\} \times \mathbf{I}(c_1, c_2) \\ & \ \sim \ N(\mu, \Sigma) \times \mathbf{I}(c_1, c_2) \end{align*} where $\Sigma = \frac{\sigma^2}{k\beta_1^2 + (n-k)\beta_2^2}$ and $\mu = \frac{- \Delta}{k\beta_1^2 + (n-k)\beta_2^2}$ with $$ \Delta = \beta_1 \sum_1^k (y_i - \alpha - \beta_1 x_i) + \beta_2 \sum_{k+1}^n (y_i - \alpha - \beta_2 x_i) $$

Is the full conditional the normal distribution truncated within interval $(c_1, c_2)$? (I think that's not the case though). Could I keep doing Gibbs sampling for $c$ from $N(\mu, \Sigma)$ until it falls in the range and then accept? If not, how could I deal with it?

I thought of Metropolis-Hastings sampling, but that's mostly for sampling unconstrained parameter. So I thought it won't work here.

Any solutions? Thanks very much.

SixSigma
  • 2,152
  • 1
  • 14
  • 24
  • 1
    The conditional posterior on $c$ is then proportional to the likelihood restricted to the interval $(c_1,c_2)$. This is a well-defined function that you can compute and hence simulate by a Metropolis-within-Gibbs step. Metropolis-Hastings also work for intervals, no issue there. – Xi'an May 05 '15 at 18:31
  • There is something strange with your model: is $k$ known? I do not think so hence $k=k(c)$, which means $\mu=\mu(c)$ and $\Delta=\Delta(c)$ and $\Sigma=\Sigma(c)$. – Xi'an May 05 '15 at 18:33
  • Since you will have to obtain results from MCMC, is there any justification in using a uniform prior for $c$ to begin with? Why not give it an improper prior or just use a "mostly noninformative" normal prior? – AdamO May 05 '15 at 18:36
  • @Xi'an, thanks for comments. If using Metropolis-Hastings, what proposal distribution should I use? I have trouble with it since $c$ is restricted to the interval. And you're right that $k=k(c)$. For a new $c$, $k$ is updated by the number of observed $x$'s that are less than $c$, as showed in the model setup. – SixSigma May 05 '15 at 18:40
  • @AdamO, since $c$ is the changepoint, it's reasonable to assume that $c$ is constrained in a range which is no greater than the range of observed $x$ values. That's why I use Uniform prior for $c$ – SixSigma May 05 '15 at 18:41
  • @AaronZeng: nothing prevents you from using a random walk proposal,$$\mathcal{U}(c^{(t)}-\xi,c^{(t)}+\xi)$$where $\xi$ is a fraction of $c_2-c_1$. – Xi'an May 05 '15 at 19:21
  • 1
    @AaronZeng But the *prior* for $c$ shouldn't really be informed by the data, should it? – AdamO May 05 '15 at 19:31

1 Answers1

1

I've figured out this problem and I thought it would be helpful to anyone who is interested in this question if I explain it a little bit here.

First and foremost, the full conditional for $p(c|.)$ derived in the Question part is not correct!!!

In my question, I derived the full conditional as following. \begin{align*} & p(c|.) \propto p(\mathcal{x}, \mathcal{y}| .) p(c) \\ & \propto \exp\left\{-\frac{1}{2\sigma^2} \Big(\beta_1^2 \sum_1^k c^2 + 2c \sum_1^k \beta_1 (y_i - \alpha - \beta_1 x_i) \Big) \right\} \\ & \quad \times \exp\left\{-\frac{1}{2\sigma^2} \Big(\beta_2^2 \sum_{k+1}^n c^2 + 2 c \sum_{k+1}^n \beta_2 (y_i - \alpha - \beta_2 x_i) \Big) \right\} \times \mathbf{I}(c_1, c_2)\\ & \propto \exp\left\{-\frac{1}{2\sigma^2} \Big(c^2 (k\beta_1^2 + (n-k)\beta_2^2) + 2 c \Delta \Big) \right\} \times \mathbf{I}(c_1, c_2) \\ \end{align*} where $ \Delta = \beta_1 \sum_1^k (y_i - \alpha - \beta_1 x_i) + \beta_2 \sum_{k+1}^n (y_i - \alpha - \beta_2 x_i) $.

But the exponential part cannot be recognized as a normal kernel anymore - as what I did in Question part - because $k$ now is a function of $c$, as pointed out by @Xi'an. Think about the sampling process. Each time $c$ is sampled, $k$ needs to be updated as the number of $\{k: x_k \leq c\}$ by definition. Therefore, the exponential part actually consists of several piecewise exponential curves. The discontinuous points occur at the observed $x$ values.

Nevertheless, we can still sample $c$ from this full conditional via slice sampling or rejection sampling. I've implemented the whole sampling procedure for all parameters. All other parameters (i.e, $\alpha, \beta_1, \beta_2, \sigma^2$) were sampled via Gibbs sampling, while $c$ can be sampled via either slice sampling or rejection sampling. For anyone who is interested, you can find my R implementation here. The sampling procedure and the implementation are correct since the code can reproduce results from OpenBUGS.

One issue I still have though is that I tried also using Metropolis-Hastings sampling for $c$ with a uniform proposal, but it's not working at this point. (See the part of code for M-H sampling here. Note that delta and cand.delta in the code is the whole part of the exponent, which is different from my notation here.) I suspect the reason is that I am not sure how to find the right proposal distribution for M-H sampling. I would be very grateful to anyone who can help figure out how to make Metropolis-Hastings sampling work for this problem.

SixSigma
  • 2,152
  • 1
  • 14
  • 24
  • Your posterior as expressed is on all the parameters, not on $c$ only. So you need to integrate out the other parameters, which is feasible here because it is a linear model. See for instance the derivation of the exact marginal likelihood in our book [Bayesian Essentials with R](http://tinyurl.com/pq4v9qx). – Xi'an Nov 09 '15 at 14:16