3

I need to estimate the Bayesian posterior of my model parameters ($\theta$) for some observed data ($D$), given a likelihood $P(\theta|D)$, and assumed priors $P(\theta)$:

$$P(\theta|D)= \frac{P(\theta|D)\; P(\theta)}{P(D)}$$

I use a MCMC algorithm which as far as I understand samples the unnormalized posterior (edit: I was wrong, the draws are taken from the full posterior).

After the MCMC is done, I can construct the probability density function for each $\theta_i$ in my set of parameters $\theta$ (from which I can obtain the necessary statistics: mean, median, confidence intervals, etc.) but I also have a rather large set of unnormalized posterior values.

As far as I understand, these values are not used at all in the analysis of the model parameters. Does this set of unnormalized posterior values have any use at all, or are they simply discarded?

Gabriel
  • 3,072
  • 1
  • 22
  • 49
  • Can you explain a little more what are the unnormalized values? They are not unnormalized values. These values have been extracted from $P(\theta|D)$ without knowing $P(D).$ – Abhinav Gupta Oct 10 '17 at 15:02
  • Oh, I'm just learning about MCMC and it was my understanding that the values drawn from the posterior were unnormalized. Do you have any article/link where this is made clear? – Gabriel Oct 10 '17 at 15:05
  • 1
    The values drawn from posterior are not unnormalized. It is just that the algorothm does not need $P(D)$.This paper (http://www.tandfonline.com/doi/pdf/10.1080/00031305.1995.10476177) by Chib and Greenberg provides a good understanding of the algorithm. – Abhinav Gupta Oct 10 '17 at 15:21
  • Thank you @AbhinavGupta! My question (edited to reflect this correction) stills stands though. What do I do with the set of posterior probability values? – Gabriel Oct 10 '17 at 15:25
  • I have added a comment to Srikant's answer. Let me know if it is still not clear. – Abhinav Gupta Oct 10 '17 at 15:28

3 Answers3

4

I have to disagree with the earlier answers that the values of the (unnormalised) posteriors at the MCMC simulations are not of any use. They actually provide a much more refined view of the posterior than an histogram, especially in multiple dimensions. One direct illustration is the construction of the HPD region: the easiest way to construct an HPD region at level $\alpha$ is to take the same percentage on the MCMC simulations with the largest [unnormalised] posterior values and to construct a convex envelope of these simulations.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • I think you are right that the unnormalized posterior might be used for constructing HPD regions with more accuracy than from MCMC samples. But the question is do I really need it after MCMC sampling? – Abhinav Gupta Oct 10 '17 at 16:20
  • Now I'm confused. Are the posterior probability values normalized or not? – Gabriel Oct 10 '17 at 16:21
  • The exact values of the posterior density at the $\theta_i$'s are know up to a constant, the inverse of the marginal density at the observation(s) $D$, $m(D)$. – Xi'an Oct 10 '17 at 16:22
  • The probability density that you estimate from MCMC sample will be normalized. I have edited my answer to include Xi'an's point. – Abhinav Gupta Oct 10 '17 at 16:23
  • @AbhinavGupta: Can you propose an efficient way of computing an HPD without this trick? I'd be very interested in the alternative. – Xi'an Oct 10 '17 at 16:23
  • @AbhinavGupta: The probability density estimated from an MCMC sample is not the true posterior density, while the product prior x likelihood is, up to a constant. The estimated density reflects the outcome of the MCMC, including potential failure to visit some regions of the space and likely high correlation in the simulated values. – Xi'an Oct 10 '17 at 16:26
  • @ Xi'an No, I am not aware of any better way of doing it. Therefore, I have edited my answer. – Abhinav Gupta Oct 10 '17 at 16:26
2

The values obtained after MCMC is a sample from the posterior $P(\theta|D)$. You have to ensure that you draw a large enough sample so that the sample itself can be considered a population. Now assuming that the sample is indeed the population, you can do anything with it that you do with the probability density. For example, you can find out the credible-region by finding $2.5^{th}$ and $97.5^{th}$ percentile or you can compute the mean of $\theta|D$.

Now, you can use this sample from posterior to estimate the probability density $P(\theta|D)$ which is usefull in constructing High Posterior Density (HPD) region. I cannot think of any other application where you will need probability values themselves. It is redundant to have both the population and its probability distribution because they provide the same information.

For your doubt regarding the normalization, you can refer to the article by Chib and Greenberg: http://www.tandfonline.com/doi/pdf/10.1080/00031305.1995.10476177. This article helped me understand the MCMC very well.

Abhinav Gupta
  • 1,511
  • 8
  • 23
-1

The draws are from the normalized posterior density. The reason why we write the posterior as proportional to the unnormalized posterior density is that the normalizing constant does not matter and drops out of the computations. Thus, the draws from an MCMC sampler are from the normalized posterior density.

Anon
  • 241
  • 1
  • 3
  • 1
    I've edited my question to remove the *unnormalized* part. The question is more about what to do with those values, rather than whether they are normalized or not. – Gabriel Oct 10 '17 at 15:12
  • These values describe your posterior distribution. You can use these values for whatever you use posterior for. For example, you can get 95% credible-region by taking $2.5^{th}$ and $97.5^{th}$ percentile, you can get mean of $\theta|D$ by taking sample mean. . – Abhinav Gupta Oct 10 '17 at 15:26
  • But the statistics for the $\theta$ parameters are obtained from the histogram of the resulting parameter values (after the MCMC is finished), not from the posterior probability values themselves (to be clear: I'm talking about the *probability values*, not the *parameter* values). Perhaps I'm not understanding something here? – Gabriel Oct 10 '17 at 15:30
  • You use MCMC to get a large enough sample from posterior and then you treat this sample as if this is the whole population. Therefore, the histogram is the actual probability density function of $\theta|D.$ – Abhinav Gupta Oct 10 '17 at 15:32
  • Yes, precisely. But these leaves me with a large set of *probability values* that I am not using. I'm only using the sampled *parameter values*, to construct its probability density function. – Gabriel Oct 10 '17 at 15:38
  • That's why it is crucial to get a large enough sample so that the sample is indeed the population. – Abhinav Gupta Oct 10 '17 at 15:39
  • I understand that @AbhinavGupta. The sampled *parameter values* are used to approximate the probability density function of said parameter. But the MCMC also gives you sampled *probability values*. My question is about whether these *probability values* are used for anything. – Gabriel Oct 10 '17 at 15:43
  • 1
    Oh! If I understand your question right then the answer is no. I cannot think of any application where you will need probability values themselves. It is redundant to have both the population and its probability distribution because they provide the same information. Does this answer your question? – Abhinav Gupta Oct 10 '17 at 15:48
  • Would you like to turn your comment into an answer? – Gabriel Oct 10 '17 at 15:48