Multivariate normal posterior

Question

This is a very simple question but I can't find the derivation anywhere on the internet or in a book. I would like to see the derivation of how one Bayesian updates a multivariate normal distribution. For example: imagine that

$$ \begin{array}{rcl} \mathbb{P}({\bf x}|{\bf μ},{\bf Σ}) & = & N({\bf \mu}, {\bf \Sigma}) \\ \mathbb{P}({\bf \mu}) &= & N({\bf \mu_0}, {\bf \Sigma_0})\,. \end{array} $$

After observing a set of ${\bf x_1 ... x_n}$, I would like to compute $\mathbb{P}({\bf \mu | x_1 ... x_n})$. I know that the answer is $\mathbb{P}({\bf \mu | x_1 ... x_n}) = N({\bf \mu_n}, {\bf \Sigma_n})$ where

$$ \begin{array}{rcl} \bf \mu_n &=& \displaystyle\Sigma_0 \left(\Sigma_0 + \frac{1}{n}\Sigma\right)^{-1}\left(\frac{1}{n}\sum_{i=1}^{n}{\bf x_i}\right) + \frac{1}{n}\Sigma\left(\Sigma_0+\frac{1}{n}\Sigma\right)^{-1}\mu_0 \\ \bf \Sigma_n & =&\displaystyle \Sigma_0\left(\Sigma_0 + \frac{1}{n}\Sigma\right)^{-1}\frac{1}{n}\Sigma \end{array}$$

I am looking for the derivation of this result with all the intermediate matrix algebra.

Any help is much appreciated.

It is also solved in our book [Bayesian Core](http://www.amazon.com/gp/product/1441919392?ie=UTF8&tag=chrprobboo-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=1441919392), Chap. 3, Section 3.2, pages 54-57 with what we think is detailed matrix algebra! — Xi'an, May 18 '12 at 20:10
@Alex: Sorry, wrong link, I meant [Bayesian Core](http://www.amazon.com/gp/product/1441922865/ref=as_li_ss_tl?ie=UTF8&tag=chrprobboo-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=1441922865). Note that we also posted [solutions to all problems on arXiv](http://xianblog.wordpress.com/2009/10/25/solution-manual-to-bayesian-core-on-line/). So posting a complete solution here would not hurt! — Xi'an, May 19 '12 at 07:56
Just as an FYI, the derivation is in Pattern Classification by Duda, Hart and Stork. However, I was having difficulty following some of their steps which only matters to me. If this was simply homework one could just write down exactly what they have. — Alex, May 19 '12 at 16:46

score 12 · Answer 1 · answered Mar 21 '17 at 09:55

With the distributions on our random vectors:

$\mathbf x_i | \mathbf \mu \sim N(\mu , \mathbf \Sigma)$

$\mathbf \mu \sim N(\mathbf \mu_0, \mathbf \Sigma_0)$

By Bayes's rule the posterior distribution looks like:

$p(\mu| \{\mathbf x_i\}) \propto p(\mu) \prod_{i=1}^N p(\mathbf x_i | \mu)$

So:

$\ln p(\mu| \{\mathbf x_i\}) = -\frac{1}{2}\sum_{i=1}^N(\mathbf x_i - \mu)'\mathbf \Sigma^{-1}(\mathbf x_i - \mu) -\frac{1}{2}(\mu - \mu_0)'\mathbf \Sigma_0^{-1}(\mu - \mu_0) + const$

$ = -\frac{1}{2} N \mu' \mathbf \Sigma^{-1} \mu + \sum_{i=1}^N \mu' \mathbf \Sigma^{-1} \mathbf x_i -\frac{1}{2} \mu' \mathbf \Sigma_0^{-1} \mu + \mu' \mathbf \Sigma_0^{-1} \mu_0 + const$

$ = -\frac{1}{2} \mu' (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1}) \mu + \mu' (\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i) + const$

$= -\frac{1}{2}(\mu - (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1}(\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i))' (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1}) (\mu - (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1}(\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i)) + const$

Which is the log density of a Gaussian:

$\mu| \{\mathbf x_i\} \sim N((N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1}(\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i), (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1})$

Using the Woodbury identity on our expression for the covariance matrix:

$(N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1} = \mathbf \Sigma(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \frac{1}{N} \mathbf \Sigma_0$

Which provides the covariance matrix in the form the OP wanted. Using this expression (and its symmetry) further in the expression for the mean we have:

$\mathbf \Sigma(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \frac{1}{N} \mathbf \Sigma_0 \mathbf \Sigma_0^{-1} \mu_0 + \frac{1}{N} \mathbf \Sigma_0(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \mathbf \Sigma \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i$

$= \mathbf \Sigma(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \frac{1}{N} \mu_0 + \mathbf \Sigma_0(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \sum_{i=1}^N (\frac{1}{N} \mathbf x_i)$

Which is the form required by the OP for the mean.

Could you make your derivation and notation more explicit? It's not clear how you jumped to the posterior mean and covariance. — Soubriquet, Apr 06 '20 at 16:33

Multivariate normal posterior

1 Answers1

Linked