26

This is a very simple question but I can't find the derivation anywhere on the internet or in a book. I would like to see the derivation of how one Bayesian updates a multivariate normal distribution. For example: imagine that

$$ \begin{array}{rcl} \mathbb{P}({\bf x}|{\bf μ},{\bf Σ}) & = & N({\bf \mu}, {\bf \Sigma}) \\ \mathbb{P}({\bf \mu}) &= & N({\bf \mu_0}, {\bf \Sigma_0})\,. \end{array} $$

After observing a set of ${\bf x_1 ... x_n}$, I would like to compute $\mathbb{P}({\bf \mu | x_1 ... x_n})$. I know that the answer is $\mathbb{P}({\bf \mu | x_1 ... x_n}) = N({\bf \mu_n}, {\bf \Sigma_n})$ where

$$ \begin{array}{rcl} \bf \mu_n &=& \displaystyle\Sigma_0 \left(\Sigma_0 + \frac{1}{n}\Sigma\right)^{-1}\left(\frac{1}{n}\sum_{i=1}^{n}{\bf x_i}\right) + \frac{1}{n}\Sigma\left(\Sigma_0+\frac{1}{n}\Sigma\right)^{-1}\mu_0 \\ \bf \Sigma_n & =&\displaystyle \Sigma_0\left(\Sigma_0 + \frac{1}{n}\Sigma\right)^{-1}\frac{1}{n}\Sigma \end{array}$$

I am looking for the derivation of this result with all the intermediate matrix algebra.

Any help is much appreciated.

Chill2Macht
  • 5,639
  • 4
  • 25
  • 51
Alex
  • 743
  • 1
  • 9
  • 16
  • 2
    It is also solved in our book [Bayesian Core](http://www.amazon.com/gp/product/1441919392?ie=UTF8&tag=chrprobboo-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=1441919392), Chap. 3, Section 3.2, pages 54-57 with what we think is detailed matrix algebra! – Xi'an May 18 '12 at 20:10
  • 4
    @Alex: Sorry, wrong link, I meant [Bayesian Core](http://www.amazon.com/gp/product/1441922865/ref=as_li_ss_tl?ie=UTF8&tag=chrprobboo-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=1441922865). Note that we also posted [solutions to all problems on arXiv](http://xianblog.wordpress.com/2009/10/25/solution-manual-to-bayesian-core-on-line/). So posting a complete solution here would not hurt! – Xi'an May 19 '12 at 07:56
  • 1
    Just as an FYI, the derivation is in Pattern Classification by Duda, Hart and Stork. However, I was having difficulty following some of their steps which only matters to me. If this was simply homework one could just write down exactly what they have. – Alex May 19 '12 at 16:46

1 Answers1

12

With the distributions on our random vectors:

$\mathbf x_i | \mathbf \mu \sim N(\mu , \mathbf \Sigma)$

$\mathbf \mu \sim N(\mathbf \mu_0, \mathbf \Sigma_0)$

By Bayes's rule the posterior distribution looks like:

$p(\mu| \{\mathbf x_i\}) \propto p(\mu) \prod_{i=1}^N p(\mathbf x_i | \mu)$

So:

$\ln p(\mu| \{\mathbf x_i\}) = -\frac{1}{2}\sum_{i=1}^N(\mathbf x_i - \mu)'\mathbf \Sigma^{-1}(\mathbf x_i - \mu) -\frac{1}{2}(\mu - \mu_0)'\mathbf \Sigma_0^{-1}(\mu - \mu_0) + const$

$ = -\frac{1}{2} N \mu' \mathbf \Sigma^{-1} \mu + \sum_{i=1}^N \mu' \mathbf \Sigma^{-1} \mathbf x_i -\frac{1}{2} \mu' \mathbf \Sigma_0^{-1} \mu + \mu' \mathbf \Sigma_0^{-1} \mu_0 + const$

$ = -\frac{1}{2} \mu' (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1}) \mu + \mu' (\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i) + const$

$= -\frac{1}{2}(\mu - (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1}(\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i))' (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1}) (\mu - (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1}(\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i)) + const$

Which is the log density of a Gaussian:

$\mu| \{\mathbf x_i\} \sim N((N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1}(\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i), (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1})$

Using the Woodbury identity on our expression for the covariance matrix:

$(N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1} = \mathbf \Sigma(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \frac{1}{N} \mathbf \Sigma_0$

Which provides the covariance matrix in the form the OP wanted. Using this expression (and its symmetry) further in the expression for the mean we have:

$\mathbf \Sigma(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \frac{1}{N} \mathbf \Sigma_0 \mathbf \Sigma_0^{-1} \mu_0 + \frac{1}{N} \mathbf \Sigma_0(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \mathbf \Sigma \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i$

$= \mathbf \Sigma(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \frac{1}{N} \mu_0 + \mathbf \Sigma_0(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \sum_{i=1}^N (\frac{1}{N} \mathbf x_i)$

Which is the form required by the OP for the mean.

conjectures
  • 3,971
  • 19
  • 36
  • Could you make your derivation and notation more explicit? It's not clear how you jumped to the posterior mean and covariance. – Soubriquet Apr 06 '20 at 16:33