Fisher information metric for hierarchical Bayesian model is negative-definite?

Question

I'm strugling with the computation of the Fisher information matrix for the hierarchical Bayesian model. For simplicity, consider theta following hierarchical Bayesian model:

\begin{align} X|\sigma &\sim N(0,\sigma^2) \\ \sigma|\gamma &\sim LN(\gamma,0) \\ \pi(\gamma) &\propto 1 \end{align}

$X$ is some observable data point, $\pi(\gamma)\propto1$ means that unknown parameter $\gamma$ has flat prior distribution, and $LN$ denotes the log-normal distribution.

The likelihood of the data then is as follows:

$$f(X|\sigma,\gamma)=\frac{1}{\sigma}e^{-\frac{x^2}{2\sigma^2}}\frac{1}{\sigma}e^{-\frac{(\ln\sigma-\gamma)^2}{2}}$$

Log-likelihood is then:

$$L(X|\sigma,\gamma)=-2\ln\sigma-\frac{x^2}{2\sigma^2}-\frac{(\ln\sigma-\gamma)^2}{2}$$

First order derivatives:

\begin{align} \partial_{\sigma}L &= -\frac{2}{\sigma}+\frac{x^2}{\sigma^3}-\frac{1}{\sigma}(\ln\sigma-\gamma) \\ \partial_{\gamma}L &= \ln\sigma-\gamma. \end{align}

Second order derivatives are then:

\begin{align} \partial_{\sigma}^2L &= \frac{1}{\sigma^2}-\frac{3x^2}{\sigma^4}+\frac{1}{\sigma^2}(\ln\sigma-\gamma) \\[5pt] \partial_{\sigma,\gamma}^2L &= \frac{1}{\sigma} \\[5pt] \partial_{\gamma}^2L &= -1 \end{align}

Thus, the hessian of the log-likelihood is: $$\newcommand{\Hess}{{\rm Hess}} \Hess(L)=\begin{pmatrix} \frac{1}{\sigma^2}-\frac{3x^2}{\sigma^4}+\frac{1}{\sigma^2}(\ln\sigma-\gamma)&\frac{1}{\sigma} \\ \frac{1}{\sigma}&-1 \end{pmatrix}$$

The Fisher information metric is defined as follows:

$$G=-E_{p(x,\theta)}\big[ \Hess(L) \big]$$

Since $E_{f(x,\theta)}\left [x^2\right ]=\sigma^2$, we have that Fisher metric is:

$$G=\begin{pmatrix} -\frac{-2+\ln\sigma-\gamma}{\sigma^2}&-\frac{1}{\sigma} \\ -\frac{1}{\sigma}&1 \end{pmatrix}$$

But (and there comes the twist), if we plug-in parameter values $\sigma=1$ and $\gamma=-3$, we obtain negative-definite matrix:

$$G=\begin{pmatrix} -1&-1 \\ -1&1 \end{pmatrix}$$

But the Fisher information matrix has to be semi-definite. So, my question is: What am I doing wrong?

I have a hunch that expectation of the parameter $\sigma$ has to be taken having in mind that it is also a random variable, not just the data $X$, i.e., for example $E_{f(X,\sigma,\gamma)}\left [\sigma \right ]\neq\sigma$ but $E_{f(X,\sigma,\gamma)}\left [\sigma \right ]=e^{\gamma+\frac{1}{2}}$ - as $\sigma$ has log-normal distribution.

I searched for literature on the Fisher information matrix formation for hierarchical models, but in vain.

You have chosen the second parameter to be 0 in the log-normal distribution, but this parameter must be strictly positive (http://en.wikipedia.org/wiki/Log-normal_distribution). I also noticed, in your second derivative $\partial_\sigma^2$, it should be $2/\sigma^2$ instead of $1/\sigma^2$. — MLaz, Aug 27 '14 at 22:19
@MLaz The derivative is calculated correctly and the second parameter ($\gamma$) can be any real number. The variance parameter of the lognormal distribution is strictly positive, but but in my case it is equal to 1. — Tomas, Aug 28 '14 at 04:54
Apologies for the (dreadful) derivative error. In your equation 2, you have $\sigma|\gamma\sim\mathrm{LN}(\gamma,0)$. Presumably it should be $\sigma|\gamma\sim\mathrm{LN}(\gamma,1)$, and this then matches equation 4. Having another look at your question, I don't see why equation 4 is "the likelihood of the data". The posterior distribution is $p(x|\sigma,\gamma)p(\sigma|\gamma)p(\gamma)/p(x)$. Here $p(x|\sigma,\gamma)$ is a conditional likelihood, conditioning on $\sigma$ and $\gamma$, sometimes also called the complete data or augmented data likelihood... — MLaz, Aug 28 '14 at 12:32
...There is the observed data likelihood $p(x|\sigma)=\int p(x|\sigma,\gamma)p(\sigma|\gamma)\mathrm{d}\sigma$, it is the complete data likelihood with $\sigma$ integrated out, sometimes known as the integrated likelihood. If you look at your equation 4, the second part of the equation does not involve the data: it is a parameter prior distribution. Similarly, $p(\gamma)$ is a hyperprior. Therefore, I don't think equation 4 represents the likelihood of the data. Reference: Applied Bayesian Hierarchical Models, Peter D. Congdon, 2010. — MLaz, Aug 28 '14 at 12:32

score 2 · Accepted Answer · edited Apr 13 '17 at 12:44

2

As indicated in the comment by MLaz, your likelihood is not right: you have to either consider the conditional likelihood, $f(x|\sigma)$, for which you get the standard Normal Fisher information $$\mathfrak{I}(\sigma)={2}\big/{\sigma}$$ or the integrated likelihood $$\int f(x|\sigma)\pi(\sigma|\gamma)\,\text{d}\sigma,$$ which does not enjoy a closed form expression and hence is very unlikely to induce a closed-form Fisher information $\mathfrak{I}(\gamma)$.

Note that, in both cases, the improper hyperprior on $\gamma$ is not used. This raises the question as to why you are interested in Fisher's information in that case.

edited Apr 13 '17 at 12:44

Community

1

answered Dec 31 '14 at 11:25

Xi'an

90,397
9
157
575

Thanks for the answer. Regarding the doubt about the improper hyper prior - since $\gamma$ is unknown in my case as well I included it in the list of unkowns. Therefore I have two parameters $\sigma^2$ and $\gamma$ for which I wanted to calculated the Fisher information. However, I do realise now that I needed an integrated loglikelihood. But since it does not have analytic expression in general, Fisher information for hierarchical models cannot be obtained without approximative methods. Right? – Tomas Jan 01 '15 at 13:30
1

Right. There are several examples on CrossValidated: for instance in [this normal-Laplace model](http://stats.stackexchange.com/q/8867/7224). – Xi'an Jan 01 '15 at 14:24

Fisher information metric for hierarchical Bayesian model is negative-definite?

1 Answers1