exact form for the marginal posterior

Question

I have a question that I come across for practicing. Basically the question is this:

Consider a random sample from the normal distribution with unknown mean and variance $Y_i \sim^{i.i.d.} N(\mu, \sigma^2)$ , $i = 1,...,n$ with the joint prior

$p(\mu, \sigma^2) \propto \sigma^{-1} (\sigma^2)^{-(v_0/2 + 1)} exp\{-\frac{1}{2\sigma^2}[v_0 \sigma_0^2 + k_0(\mu_0 - \mu)^2]\}$ and where $\mu_0, k_0, v_0$ and $\sigma_0^2$ are all known.

Derive the exact form of the marginal posterior distributions of $p(\sigma^2 | Y_1,...,Y_n)$ and $p(\mu | Y_1,...,Y_n)$

So I basically try to write the joint posterior as the product of the prior $p(\mu, \sigma^2)$ and the likelihood function which is $\frac{1}{\sqrt{2\pi}\sigma^n} \exp(-\frac{1}{2\sigma^2} \sum(y_i-\mu)^2)$.

So I have this:

$\sigma^{-1} (\sigma^2)^{-(v_0/2 + 1)} exp\{-\frac{1}{2\sigma^2}[v_0 \sigma_0^2] + k_0(\mu_0 - \mu)^2]\} \cdot \frac{1}{\sqrt{2\pi}\sigma^n} \exp(-\frac{1}{2\sigma^2} \sum(y_i-\mu)^2)$

and then to find $p(\mu | Y_1,...,Y_n)$, I tried taking the integral of the above function from negative infinity to positive infinity.

However I got stuck.

Also I just want to know if it is correct that when I try to find the marginal of $\mu$, i.e. $p(\mu | Y_1,...,Y_n)$: I would take the integral from 0

And when I try to find the marginal of $\sigma^2$, i.e. $p(\sigma^2 | Y_1,...,Y_n)$, I would integrate from negative infinity to positive infinity?

Could someone give some suggestions.

thanks

You are missing a power $n$ on the $\sigma$ in front. Otherwise the derivation of both marginals is quite straightforward. — Xi'an, Nov 09 '19 at 00:32
Hello, I still couldn't figure out how to find the marginal of $\mu$ yet. I guess I should integrate with respect to $\sigma^2$ from negative infinity to positive infinity? But it seems that it is not trivial. — john_w, Nov 10 '19 at 00:29
Hi john_w, variance cannot be negative, so the integration bounds are supposed to be $0$ and $\infty$. — Konstantin, Nov 11 '19 at 15:55
The thread at https://stats.stackexchange.com/questions/52906/student-t-as-mixture-of-gaussian/59520#59520 appears relevant. — whuber, Nov 18 '19 at 15:52

Konstantin · Answer 1 · 2019-11-17T10:16:05.050

5

Answer:

Posterior of $\sigma^2|Y_1,..., Y_n$ is an instance of inverse gamma distribution with the probability density

$$ p(\sigma^2|Y_1,...,Y_n) = \frac{\beta^\alpha}{\Gamma(\alpha)} (\sigma^2)^{-\alpha+1}\exp(-\frac{\beta}{\sigma^2}), $$

where

\begin{align} \alpha:=\frac{\nu_0+n}{2}, \quad& \beta:=\frac{\nu_0\sigma_0^2+n(\hat \sigma^2 + \frac{k_0}{k_0+n}(\bar Y - \mu_0)^2)}{2},\\ \bar Y := \frac{1}{n}\sum_i Y_i, \quad &\hat \sigma^2 := \frac{1}{n}\sum_i(Y_i-\bar Y)^2. \end{align}

The posterior of $\mu|Y_1,...,Y_n$ is a shifted and scaled Student's t-distribution. It's probability density function can be written down as

$$ p(\mu|Y_1,...,Y_n) = \frac{\Gamma \left(\frac{\nu+1}{2} \right)} {\sqrt{\nu\pi\gamma}\,\Gamma \left(\frac{\nu}{2} \right)} \left(1+\frac{(\mu-\mu')^2}{\nu \gamma} \right)^{-\frac{\nu+1}{2}},$$

where $\Gamma(x)$ is the Gamma function, $\nu:=\nu_0+n$ is the number of degrees of freedom, $\gamma:=\frac{\beta}{\alpha(n+k_0)}$ is the scale parameter $\mu' := \frac{n}{n+k_0}\bar Y + \frac{k_0}{n+k_0}\mu_0 $ is the mean of the interim posterior $\mu|\tau,\mathbf{Y}$:

\begin{equation} \mu|\tau,\mathbf{Y} \sim N(\mu', ((n+k_0)\tau)^{-1}). \qquad\qquad (\star) \end{equation}

Detailed steps:

(There is a great variety of sources online, one of which I am reproducing almost without changes: lectures of Michael I. Jordan from Berkeley.)

Derivation is quite straightforward, once we introduce notation $\tau := \frac{1}{\sigma^2}$, $\alpha_0 := \frac{\nu_0}{2}$, $\beta_0 := \alpha_0 \sigma^2_0$, and recognize in the stated prior the following hierarchical model:

\begin{align} Y_i &\sim N(\mu, \tau^{-1})\\ \mu &\sim N(\mu_0, (k_0\tau)^{-1})\\ \tau &\sim Gamma(\alpha_0,\beta_0)\\ \end{align}

where $Gamma$ stands for Gamma distribution with the probability density function

$$p(\tau|\alpha_0,\beta_0) = \frac{\beta_0^{\alpha_0}}{\Gamma(\alpha_0)} \tau^{\alpha_0-1}\exp(-\tau \beta_0).$$

We are looking for the posterior $\mu| \mathbf{Y}$ and $\tau | \mathbf{Y}$, where $\mathbf{Y}:= (Y_1,...,Y_n) $.

Obtaining posterior $\mu|\mathbf{Y}$ is a matter of taking an expectation of pdf of the interim posterior $(\star)$ with respect to the posterior $\tau|\mathbf{Y}$:

\begin{equation} p(\mu|\mathbf{Y}) = \int_0^\infty p(\mu|\tau, \mathbf{Y}) p(\tau|\mathbf{Y})d \tau \quad (*) \end{equation}

So the first step would be to obtain the posterior $\tau| \mathbf{Y}$, which is just the marginal of the joint posterior $\mu,\tau |\mathbf{Y}$:

\begin{align} p(\tau,\mu|\mathbf{Y}) \propto & \prod_i p(Y_i|\mu,\tau)\cdot p(\mu|\tau)\cdot p(\tau) \\ \propto & \tau^{n/2} \exp\left(-\frac{\tau}{2}\sum_i(Y_i-\mu + \bar Y - \bar Y)^2\right) \cdot \tau^{1/2} \exp\left( -\frac{k_0 \tau}{2}(\mu-\mu_0)^2 \right) \cdot \tau^{\alpha_0-1} \exp(-\beta_0\tau) \\ \propto & \tau^{\alpha_0 + \frac{n}{2}-1}\exp\left(-\tau(\beta_0 + \frac{1}{2}\sum(Y_i-\bar Y)^2) \right) \tau^{1/2} \cdot \exp\left(-\frac{\tau}{2}(k_0(\mu-\mu_0)^2+n(\bar Y - \mu)^2)\right) \end{align}

In the last expression we can factorize a kernel of a normal out of the second term (on the right of $\cdot$):

\begin{align} &\exp\left(-\frac{\tau}{2}(k_0(\mu-\mu_0)^2+n(\bar Y - \mu)^2)\right) =\\ &= \exp\left(-\frac{\tau}{2}((k_0+n)\mu^2-2(k_0\mu_0 + n\bar Y) \mu + k_0 \mu_0^2+n\bar Y^2)\right)\\ &= \exp\left(-\frac{\tau}{2}((k_0+n)(\mu^2-2\frac{k_0\mu_0 + n\bar Y}{k_0+n}\mu + {\mu'}^2) - (k_0+n){\mu'}^2 + k_0 \mu_0^2+n\bar Y^2)\right)\\ &= \exp(-\frac{\tau}{2}(k_0+n)(\mu - \mu')^2) \cdot \exp\left(\frac{\tau}{2}( \frac{k_0^2 \mu_0^2 + 2 n k_0 \mu_0 \bar Y + n^2 \bar Y^2}{n+k_0} - k_0 \mu_0^2 -n \bar Y^2)\right)\\ &= \tau^{1/2} \exp(-\frac{\tau}{2}(k_0+n)(\mu - \mu')^2) \cdot \tau^{-1/2}\exp\left(-\frac{n k_0 \tau}{2(n+k_0)} (\bar Y -\mu_0)^2\right) \end{align}

The first term in the above product is going to integrate to $\sqrt{\frac{2\pi}{k_0+n}}$ (pdf of $N(\mu', \frac{1}{(n+k_0)\tau})$) and may be neglected, whereas the second term will be factorized leaving us with the following posterior for $\tau| \mathbf{Y}$:

\begin{equation} p(\tau|\mathbf{Y}) \propto \tau^{\alpha_0 + \frac{n}{2}-1} \exp(-\tau \left(\beta_0 + \frac{1}{2}\sum_i(Y_i-\bar Y)^2 + \frac{n k_0}{2(n+k_0)}(\bar Y -\mu_0)^2\right)) \end{equation}

in which the kernel of a Gamma distribution is easily recognizable, i.e.

$$\tau|\mathbf{Y} \sim Gamma(\alpha,\beta)$$

where $\alpha := \alpha_0 + \frac{n}{2}$ and $\beta := \beta_0 + \frac{1}{2}\sum_i(Y_i-\bar Y)^2 + \frac{n k_0}{2(n+k_0)}(\bar Y -\mu_0)^2$.

Finally compute the expectation $(*)$:

\begin{align} p(\mu|\mathbf{Y}) = & \int_0^\infty \frac{{\beta}^{\alpha}}{\Gamma(\alpha)}\tau^{\alpha-1} \exp(-\tau \beta) \cdot \frac{(n+k_0)^{1/2}\tau^{1/2}}{\sqrt{2 \pi}} \exp\left(-\frac{n+k_0}{2}\tau (\mu - \mu')^2 \right) d\tau \\ \propto & \int_0^\infty \tau^{\alpha+\frac{1}{2}-1} \exp\left(-\tau \beta - \tau \frac{n+k_0}{2}(\mu - \mu')^2 \right) d\tau \quad (**)\\ \propto & \Gamma(\alpha + \frac{1}{2}) \left(\beta + \frac{n+k_0}{2}(\mu-\mu')^2\right)^{-\alpha-\frac{1}{2}} \\ \propto & (1 + \frac{1}{2\alpha}\frac{(\mu-\mu')^2}{\frac{\beta}{(n+k_0)\alpha}})^{-\frac{2\alpha+1}{2}} \end{align}

In the integrand in expression $(**)$ we see the kernel of $Gamma(\alpha+\frac{1}{2}, \beta + \frac{n+k_0}{2}(\mu - \mu')^2)$ which integrates to the expression in which one can easily recognize the kernel of a Student's t-distribution with mean $\mu'$, scale parameter $\frac{\beta}{(n+k_0)\alpha}$ and $2\alpha$ degrees of freedom.

edited Nov 17 '19 at 10:16

answered Nov 11 '19 at 22:38

Konstantin

1,301
3
14

hello, could you explain a little more about how you get the term right after you said "being a kernel of a normal integrates (up to a constant factor) to". Sorry I mean how does the 2nd term in the product (i.e the term after the * ) becomes this term (the term I am referring to here) when you integrate with respect to $\mu$? I think I might have to group some terms to make that a Normal Distributed density function with certain means and variances, but I have not figured out how to group the terms yet. – john_w Nov 15 '19 at 21:49
Hey, sure. I added a clarification, 5 additional lines in the middle. To get to the fourth from the third, you have to just carefully open the parentheses. – Konstantin Nov 15 '19 at 22:54
Hi, sorry may I know what is the $\bar{\mu}$? – john_w Nov 16 '19 at 00:38
Also sorry how does the $2n\bar{Y}\mu$ in the second line disappears in the third line in your newly additionally added 5 lines? – john_w Nov 16 '19 at 00:52
$\bar \mu$ was a mistake aon my behalf, it is really $\mu' = \frac{n\bar Y + k_0 \mu_0}{n+k_0}$. I also added one more intermediate line, hope it makes everything clearer – Konstantin Nov 16 '19 at 11:01
Hello, could you just explain how to get the 2nd last line to the last line especially the exp$\frac{nk_p\tau}{2(n+k_0)} (\bar{Y}-\mu_0)^2$ – john_w Nov 16 '19 at 22:20
for example in the line before last line there is no term that involves $\bar{Y}\mu_0$, but in the last line, there is a term of $(\bar{Y}-\mu_0)^2$, so it means there is a term of $2*\bar{Y}*\mu_0$ here (when you expand $(\bar{Y}-\mu_0)^2$). But there is none in the line preceding to it. May I know is it because you are discarding the terms that involves $\bar{Y}\mu0$ (for example term such as 2*$\bar{Y}\mu_0$) as part of the proportional constant? – john_w Nov 16 '19 at 22:42
I am just using the definition of $\mu'$ back and forth. It is the posterior mean of a normal, very convenient. Anyway, I rewrote the penultimate line in that bunch, I hope it makes everything clear. And please note, there is no discarding of proportional constants in that piece as $\tau$ is involved in every term in the product. – Konstantin Nov 17 '19 at 01:28
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/101134/discussion-between-konstantin-and-john-w). – Konstantin Nov 17 '19 at 01:31
Hi Konstantin, sorry I just have one more question. I am just wondering how to extract the $\tau$ coefficient for each of the distribution that you have there when you write out the joint posterior of $p(\mu, \tau|Y)$: for example how do you get the $\tau^{1/2}$ for the second term, and $\tau^{\alpha_0-1}$ for the third term? – john_w Nov 18 '19 at 03:28
because to write out the joint posterior: I multiply the $P(Y_1,Y_2,... | \mu, \tau)$ by the joint prior. – john_w Nov 18 '19 at 03:29
So I got the joint having the following combination of $\tau$: $(\frac{1}{\sigma^2}) ^{n/2}$ (this comes from the distribution of the $Y_1,Y_2,...,Y_n$, and then the $\frac{1}{\sigma^n} (\sigma^2)^{-(v_0/2 + 1)}$. – john_w Nov 18 '19 at 03:30
So I got $\tau^{{(\frac{n}{2})}}$ from the distribution of $Y_1,Y_2,...,Y_n$, $\tau^{\frac{n}{2}}$ from $\frac{1}{\sigma^n}$ and then $\tau^{\frac{v_0}{2} + 1}$ from $(\sigma^2)^{-(\frac{v_0}{2} + 1 )}$. But the sum of exponents in those $\tau$ seem to be $\frac{n}{2} + \frac{n}{2} + \frac{v_0}{2} + 1 $ – john_w Nov 18 '19 at 03:36
I think this will be the last question I have. Sorry about that. Could you just clarify about the exponent of the $\tau$. – john_w Nov 18 '19 at 03:42
Initially, you only have $(\frac{1}{\sigma^2})^{n/2}$ from the likelihood of $\mathbf{Y}$, $\sigma^{-1}$ from the prior of $\sigma^2$ and $\sigma^{-1}$ from the prior of $\mu$. There's no $\frac{1}{\sigma^n}$ coming from the prior of $\mu$. – Konstantin Nov 18 '19 at 05:20
Hi, but could you have a look at my "joint prior" in my question? I have the $ \frac{1}{\sigma^n} (\sigma^2)^{-(v_0/2 + 1)} $ in the front of the joint prior. It is after the "Edit". – john_w Nov 18 '19 at 05:24
Sure, I did. Joint prior in the main text was OK, you misinterpreted @Xi'an's comment abount the power $n$: it is the formula for the likelihood that was supposed to have $\frac{1}{\sigma^n}$ instead of $\frac{1}{\sigma}$. **$n$ is the characteristic of the data you observe, it may not be a parameter of the prior**. I edited your question statement accordingly. – Konstantin Nov 18 '19 at 05:26
Hi yes, the n was missing from the density of the $Y_1,Y_2,...$, but then from the multiplication of the likelihood function and the joint prior (please look at the joint prior instead of the individual $p(\mu|\tau)$ and $p(\tau)$, I still don't get the correct sum of the power of $\tau$. For example from the likelihood I get the power of $\tau$ to be $\frac{n}{2}$, and then from the joint prior I got $\sigma^{-1} = (\sigma^2)^{(-\frac{1}{2})} = \tau^{\frac{1}{2}}$ , and finally the $\tau^{(\frac{v_0}{2}+1)}$, so here in total there is a power of $n/2 + 1/2 + v_0/2 + 1 $. – john_w Nov 18 '19 at 05:44
I understand your answer is based on the link you provided there (as it was based on assuming we know the individual distribution of the prior of tau and the conditional distribution of $\mu$ given $\tau$). But because I was given the joint prior to be like written in the question, and it seems the sum of the power of the $\tau$ doesn't seem to equal to the one from the pdf in the link. So I was not sure why I don't get the same power of $\tau$. Because it could effect the $\alpha$ parameter from the Gamma Distribution if the power is not correct. – john_w Nov 18 '19 at 05:56
Agreed, my parameterization was off, apologies. Everything should work if $\alpha_0 := \frac{\nu_0+4}{2}$ and $\beta_0 := \frac{\nu_0\sigma_0^2}{2}$. Please do let me know if you find something else inconsistent. – Konstantin Nov 18 '19 at 06:27
Hi, sorry could you explain in details about why when $\alpha_0 := \frac{v_0 + 4}{2}$ and $\beta_0 := \frac{v_0 \sigma_0^2}{2}$ would make it work? Anyway I will give you the bounty first. Maybe could you edit your answer (but leave the old answer just for comparison here by adding new lines below the old answer). – john_w Nov 18 '19 at 06:37

Ben · Answer 2 · 2019-12-14T22:49:40.777

For this type of analysis, it is often possible to decompose the posterior density into a part representing the marginal posterior of one of the parameters, and another part representing the conditional posterior of the other parameter. It turns out to be possible to do this in the present case.

To facilitate our analysis, let us define the useful posterior quantities:

$$\mu_* \equiv \frac{n \bar{y} + k_0 \mu_0}{n + k_0} \quad \quad \quad \quad \quad \beta_* \equiv \frac{||\mathbf{y}||^2 + v_0 \sigma_0^2 + k_0 \mu_0^2 - (n+k_0) \mu_*}{2}.$$

Now, we can solve this problem by writing out the posterior kernel, and then collect all terms involving $\mu$ and simplify this into the kernel of a known density function (in this case the normal density). Using the method of completing the square, we obtain:

$$\begin{equation} \begin{aligned} p(\mu, \sigma^2 | \mathbf{y}) &\propto L_\mathbf{y}(\mu, \sigma^2) \cdot p(\mu, \sigma^2) \\[6pt] &\propto \sigma^{-n} \exp \Bigg( - \frac{1}{2 \sigma^2} \cdot \sum_{i=1}^n ( y_i-\mu)^2 \Bigg) \cdot \sigma^{-v_0 -3} \exp \Bigg( -\frac{1}{2\sigma^2} [v_0 \sigma_0^2+ k_0(\mu_0 - \mu)^2] \Bigg) \\[6pt] &= \sigma^{-n-v_0 -3} \exp \Bigg( - \frac{1}{2 \sigma^2} \Bigg[ \sum_{i=1}^n ( y_i-\mu)^2 +v_0 \sigma_0^2+ k_0(\mu_0 - \mu)^2 \Bigg] \Bigg) \\[6pt] &= \sigma^{-n-v_0 -3} \exp \Bigg( - \frac{1}{2 \sigma^2} \Bigg[ (||\mathbf{y}||^2 -2 n \bar{y} \mu + n \mu^2) + v_0 \sigma_0^2 + (k_0 \mu_0^2 - 2 k_0 \mu_0 \mu + k_0 \mu^2) \Bigg] \Bigg) \\[6pt] &= \sigma^{-n-v_0 -3} \exp \Bigg( - \frac{1}{2 \sigma^2} \Bigg[ -2 (n \bar{y} + k_0 \mu_0 ) \mu + (n + k_0) \mu^2 + ||\mathbf{y}||^2 + v_0 \sigma_0^2 + k_0 \mu_0^2 \Bigg] \Bigg) \\[6pt] &= \sigma^{-n-v_0 -3} \exp \Bigg( - \frac{n + k_0}{2 \sigma^2} \Bigg[ -2 \mu_* \mu + \mu^2 \Bigg] - \frac{||\mathbf{y}||^2 + v_0 \sigma_0^2 + k_0 \mu_0^2}{2 \sigma^2} \Bigg) \\[6pt] &= \sigma^{-n-v_0 -3} \exp \Bigg( - \frac{n + k_0}{2 \sigma^2} \Bigg[ \mu_*^2 -2 \mu_* \mu + \mu^2 \Bigg] - \frac{\beta_*}{\sigma^2} \Bigg) \\[6pt] &= \sigma^{-n-v_0 -3} \exp \Bigg( - \frac{n + k_0}{2 \sigma^2} ( \mu - \mu_* )^2 - \frac{\beta_*}{\sigma^2} \Bigg) \\[6pt] &= \sigma^{-1} \exp \Bigg( - \frac{n + k_0}{2 \sigma^2} ( \mu - \mu_* )^2 \Bigg) \cdot \sigma^{-n-v_0 -2} \exp \Bigg( - \frac{\beta_*}{\sigma^2} \Bigg) \\[6pt] &= \sigma^{-1} \exp \Bigg( - \frac{n + k_0}{2 \sigma^2} ( \mu - \mu_* )^2 \Bigg) \cdot (\sigma^2)^{-(n+v_0)/2 -1} \exp \Bigg( - \frac{\beta_*}{\sigma^2} \Bigg) \\[6pt] &\propto \text{N} \Big( \mu \Big| \mu_*, \frac{\sigma^2}{n + k_0} \Big) \cdot \text{InvGa} \Big( \sigma^2 \Big| \frac{n+v_0}{2}, \beta_* \Big). \\[6pt] \end{aligned} \end{equation}$$

Since the joint density is a probability density, we then have:

$$p(\mu, \sigma^2 | \mathbf{y}) = \text{N} \Big( \mu \Big| \mu_*, \frac{\sigma^2}{n + k_0} \Big) \cdot \text{InvGa} \Big( \sigma^2 \Big| \frac{n+v_0}{2}, \beta_* \Big). $$

From this equation we obtain the marginal distribution:

$$\sigma^2 | \mathbf{y} \sim \text{InvGa} \Big( \frac{n+v_0}{2}, \beta_* \Big). $$

hello Reinstate Monica, I actually have a question about your answer in almost the second last line of your detailed proof, may I know how you get the $\sigma ^ {-n -v_0 -2} exp(\frac{-\beta_*}{\sigma^2})$ to have the inverse gamma with the first parameter being $n+v_0 + 1$? — john_w, Dec 14 '19 at 22:13
because there must be a term of $\sigma^2$ to the left of the $exp(\frac{-\beta_*}{\sigma^2})$ — john_w, Dec 14 '19 at 22:14
now that when I go over all answers more slowly, I think actually your answer is more close to what I need because it seems that you are going from unknown to known (to actually answer the question from what given in question). But sorry I already awarded the points, should have split if possible or award to your answer. But my other question is your answer and the answer that I awarded the point both got InverseGamma for the posterior of sigma, but the answer for the posterior of mu is different. The other answer has student-t as the marginal of the posterior of mu. — john_w, Dec 14 '19 at 23:12
I now think your answer seems more likely to be correct. May I know do you think the marginal posterior is normal instead of student-t? because it seems it is normal distributed based on your detailed and seem to be correct steps. — john_w, Dec 14 '19 at 23:13
The normal distribution in my working is the *conditional distribution* of $\mu$ given knowledge of $\sigma$ --- the marginal distribution would still be a T-distribution. (Don't worry about the points - I've got plenty of points.) — Ben, Dec 15 '19 at 03:00
Hello, then does it mean to get the answer of marginal posterior of $\mu$, you would have to integrate that conditional distribution of $\mu$ (using the mean parameter and the variance parameter in this particular conditional normal distribution) with respect to $\sigma^2$ from 0 to infinity? and then that distribution should be a t distribution? — john_w, Dec 15 '19 at 03:26
if it is the case, could you provide little more step to show how the conditional of $\mu$ being normal would become a t-distribution after you integrate out the $\sigma^2$? — john_w, Dec 15 '19 at 03:28

exact form for the marginal posterior

2 Answers2