14

When infering the precision matrix $\boldsymbol{\Lambda}$ of a normal distribution used to generate $N$ D-dimensional vectors $\mathbf{x_1},..,\mathbf{x_N}$ \begin{align} \mathbf{x_i} &\sim \mathcal{N}(\boldsymbol{\mu, \Lambda^{-1}}) \\ \end{align} we usually place a Wishart prior over $\boldsymbol{\Lambda}$ since the Wishart distribution is the conjugate prior for the precission of a multivariate normal distribution with known mean and unknown variance: \begin{align} \mathbf{\Lambda} &\sim \mathcal{W}(\upsilon, \boldsymbol{\Lambda_0}) \\ \end{align} where $\upsilon$ are the degrees of freedom and $\boldsymbol{\Lambda_0}$ the scale matrix. To add robustness and flexibility to the model we put an hyperprior over the parameters of the Wishart. For instance, Görür and Rasmussen suggest: \begin{align} \mathbf{\Lambda_0} &\sim \mathcal{W}(D, \frac{1}{D}\boldsymbol{\Lambda_x}) \\ \frac{1}{\upsilon-D + 1} &\sim \mathcal{G}(1, \frac{1}{D}) \\ \end{align} where $\mathcal{G}$ is tha Gamma distribution.

Question:

in order to sample the posterior of $\boldsymbol{\Lambda_0}$ \begin{align} p(\boldsymbol{\Lambda_0 | X, \Lambda}, \upsilon, D, \boldsymbol{\Lambda_x}) \propto \mathcal{W}(\boldsymbol{\Lambda} | \upsilon, \boldsymbol{\Lambda_0}) \mathcal{W}(\boldsymbol{\Lambda_0} |D, \frac{1}{D}\boldsymbol{\Lambda_x}) \\ \end{align}

what is the family and parameters of this posterior?

PS:

Dropping all factors that do not depend on $\boldsymbol{\Lambda_0}$ and identifying the parameters with the parameters of a Wihsart I get a Wishart with parameters: \begin{align} \upsilon' &= \upsilon + D\\ \boldsymbol{\Lambda'} &= \boldsymbol{\Lambda} + \boldsymbol{\Lambda_x} \end{align}

which looks quite nice, but I am not confident at all since I don't find any example neither on books nor the internet.

Erratum:

Görur and Rasmussen suggest those hyperpriors over the Wishart parameters, but this equation: \begin{align} \mathbf{\Lambda} &\sim \mathcal{W}(\upsilon, \boldsymbol{\Lambda_0}) \\ \end{align}

should be instead: \begin{align} \mathbf{\Lambda} &\sim \mathcal{W}(\upsilon, \boldsymbol{\Lambda_0}^{-1}) \\ \end{align}

therefore solving the lack of conjugacy. If we want to keep $\boldsymbol{\Lambda_0}$ then we should use the Inverse Wishart as a prior (see @Xi'an 's answer)

alberto
  • 2,646
  • 16
  • 36

2 Answers2

8

Ok, thanks to @Xi'an answer I could make the whole derivation. I will write it for a general case: \begin{align} \mathcal{W}(\mathbf{W} | \upsilon, \mathbf{S^{-1}} ) \times \mathcal{W}(\mathbf{S} | \upsilon_0, \mathbf{S_0}) \end{align} where the $\mathbf{S^{-1}}$ is the key to conjugacy. If we want to use $\mathbf{S}$ then it should be : \begin{align} \mathcal{W}(\mathbf{W} | \upsilon, \mathbf{S} ) \times \mathcal{IW}(\mathbf{S} | \upsilon_0, \mathbf{S_0}) \end{align}

I'm doing the first case (please correct me if I am wrong): \begin{align} \mathcal{W}(\mathbf{W} | \upsilon, \mathbf{S^{-1}} ) \times \mathcal{W}(\mathbf{S} | \upsilon_0, \mathbf{S_0}) &\propto |\mathbf{S}|^{\upsilon/2} \exp\{-\frac{1}{2} tr(\mathbf{SW}) \}\\ &\times |\mathbf{S}|^{\frac{\upsilon_0 - D -1 }{2}} \exp\{-\frac{1}{2} tr (\mathbf{S_0^{-1} S})\}\\ &\propto |\mathbf{S}|^{\frac{\upsilon + \upsilon_0 - D -1 }{2}} \exp\{-\frac{1}{2} tr ( (\mathbf{W} + \mathbf{S_0^{-1}) S})\} \end{align}

where we used the fact that $tr({\mathbf{SW}}) = tr({\mathbf{WS}})$. By inspection, we see that this is a Wishart distribution: \begin{align} p(\mathbf{S} | \cdot) = \mathcal{W}(\upsilon+ \upsilon_0, \mathbf{(W+S_0^{-1})^{-1}}) \end{align}

Extension for $N$ draws $\mathbf{W_1...W_N}$:

For the case when we have $N$ precision matrices then the likelihood becomes a product of $N$ likelihoods and we get:

\begin{align} p(\mathbf{S} | \cdot) = \mathcal{W}(N \upsilon+ \upsilon_0, (\sum_{i=1}^N \mathbf{W_i+S_0^{-1})^{-1}}) \end{align}

alberto
  • 2,646
  • 16
  • 36
6

The product of the two densities in $$ p(\boldsymbol{\Lambda_0 | X, \Lambda}, \upsilon, D, \boldsymbol{\Lambda_x}) \propto \mathcal{W}(\boldsymbol{\Lambda} | \upsilon, \boldsymbol{\Lambda_0}) \mathcal{W}(\boldsymbol{\Lambda_0} |D, \frac{1}{D}\boldsymbol{\Lambda_x}) \\ $$ leads to \begin{align*} p(\boldsymbol{\Lambda_0 | X, \Lambda}, \upsilon, D, \boldsymbol{\Lambda_x}) &\propto |\boldsymbol{\Lambda_0}|^{-\upsilon/2}\,\exp\{-\text{tr}(\boldsymbol{\Lambda_0}^{-1}\boldsymbol{\Lambda})/2\}\\ &\times |\boldsymbol{\Lambda_0}|^{(D-p-1)/2}\,\exp\{-D\,\text{tr}(\boldsymbol{\Lambda_x}^{-1}\boldsymbol{\Lambda_0})/2\}\\ &\propto|\boldsymbol{\Lambda_0}|^{(D-\upsilon-p-1)/2}\,\exp\{-tr(\boldsymbol{\Lambda_0}^{-1}\boldsymbol{\Lambda}+D\,\boldsymbol{\Lambda_x}^{-1}\boldsymbol{\Lambda_0})/2\}\,, \end{align*}
which does not appear to be a standard density. To keep conjugacy of sorts, the right hierarchical prior on $\boldsymbol{\Lambda_0}$ should be something like $$ \boldsymbol{\Lambda_0}\sim\mathcal{IW}(\boldsymbol{\Lambda_0} |D, \frac{1}{D}\boldsymbol{\Lambda_x})\,. $$

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • 1
    Thanks for the hint @Xi'an !, Actually the parameter in the likelihood should be $\mathbf{\Lambda_0^{-1}}$ (my fault, see edit). I just posted an answer using this and keeping the Wishart*Wishart. – alberto Dec 19 '14 at 12:24