I'm strugling with the computation of the Fisher information matrix for the hierarchical Bayesian model. For simplicity, consider theta following hierarchical Bayesian model:
\begin{align} X|\sigma &\sim N(0,\sigma^2) \\ \sigma|\gamma &\sim LN(\gamma,0) \\ \pi(\gamma) &\propto 1 \end{align}
$X$ is some observable data point, $\pi(\gamma)\propto1$ means that unknown parameter $\gamma$ has flat prior distribution, and $LN$ denotes the log-normal distribution.
The likelihood of the data then is as follows:
$$f(X|\sigma,\gamma)=\frac{1}{\sigma}e^{-\frac{x^2}{2\sigma^2}}\frac{1}{\sigma}e^{-\frac{(\ln\sigma-\gamma)^2}{2}}$$
Log-likelihood is then:
$$L(X|\sigma,\gamma)=-2\ln\sigma-\frac{x^2}{2\sigma^2}-\frac{(\ln\sigma-\gamma)^2}{2}$$
First order derivatives:
\begin{align} \partial_{\sigma}L &= -\frac{2}{\sigma}+\frac{x^2}{\sigma^3}-\frac{1}{\sigma}(\ln\sigma-\gamma) \\ \partial_{\gamma}L &= \ln\sigma-\gamma. \end{align}
Second order derivatives are then:
\begin{align} \partial_{\sigma}^2L &= \frac{1}{\sigma^2}-\frac{3x^2}{\sigma^4}+\frac{1}{\sigma^2}(\ln\sigma-\gamma) \\[5pt] \partial_{\sigma,\gamma}^2L &= \frac{1}{\sigma} \\[5pt] \partial_{\gamma}^2L &= -1 \end{align}
Thus, the hessian of the log-likelihood is: $$\newcommand{\Hess}{{\rm Hess}} \Hess(L)=\begin{pmatrix} \frac{1}{\sigma^2}-\frac{3x^2}{\sigma^4}+\frac{1}{\sigma^2}(\ln\sigma-\gamma)&\frac{1}{\sigma} \\ \frac{1}{\sigma}&-1 \end{pmatrix}$$
The Fisher information metric is defined as follows:
$$G=-E_{p(x,\theta)}\big[ \Hess(L) \big]$$
Since $E_{f(x,\theta)}\left [x^2\right ]=\sigma^2$, we have that Fisher metric is:
$$G=\begin{pmatrix} -\frac{-2+\ln\sigma-\gamma}{\sigma^2}&-\frac{1}{\sigma} \\ -\frac{1}{\sigma}&1 \end{pmatrix}$$
But (and there comes the twist), if we plug-in parameter values $\sigma=1$ and $\gamma=-3$, we obtain negative-definite matrix:
$$G=\begin{pmatrix} -1&-1 \\ -1&1 \end{pmatrix}$$
But the Fisher information matrix has to be semi-definite. So, my question is: What am I doing wrong?
I have a hunch that expectation of the parameter $\sigma$ has to be taken having in mind that it is also a random variable, not just the data $X$, i.e., for example $E_{f(X,\sigma,\gamma)}\left [\sigma \right ]\neq\sigma$ but $E_{f(X,\sigma,\gamma)}\left [\sigma \right ]=e^{\gamma+\frac{1}{2}}$ - as $\sigma$ has log-normal distribution.
I searched for literature on the Fisher information matrix formation for hierarchical models, but in vain.