26

Let $\theta \in R^{n}$. The Fisher Information Matrix is defined as:

$$I(\theta)_{i,j} = -E\left[\frac{\partial^{2} \log(f(X|\theta))}{\partial \theta_{i} \partial \theta_{j}}\bigg|\theta\right]$$

How can I prove the Fisher Information Matrix is positive semidefinite?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
madprob
  • 363
  • 3
  • 6
  • 8
    Isn't it the expected value of an outer product of the score with itself? – Neil G Feb 13 '13 at 21:26
  • @NeilG Not necessarily. People can define Fisher's information as the expectation of the Hessian matrix of the log-likelihood function. Then, only under "certain regularization conditions", we have Fisher's information equal to the variance of the score vector (gradient of log-likelihood function). – Tan Feb 27 '22 at 19:58

2 Answers2

26

Check this out: http://en.wikipedia.org/wiki/Fisher_information#Matrix_form

From the definition, we have

$$ I_{ij} = \mathrm{E}_\theta \left[ \left(\partial_i \log f_{X\mid\Theta}(X\mid\theta)\right) \left(\partial_j \log f_{X\mid\Theta}(X\mid\theta)\right)\right] \, , $$ for $i,j=1,\dots,k$, in which $\partial_i=\partial /\partial \theta_i$. Your expression for $I_{ij}$ follows from this one under regularity conditions.

For a nonnull vector $u = (u_1,\dots,u_k)^\top\in\mathbb{R}^n$, it follows from the linearity of the expectation that $$ \sum_{i,j=1}^k u_i I_{ij} u_j = \sum_{i,j=1}^k \left( u_i \mathrm{E}_\theta \left[ \left(\partial_i \log f_{X\mid\Theta}(X\mid\theta)\right) \left(\partial_j \log f_{X\mid\Theta}(X\mid\theta)\right)\right] u_j \right) \\ = \mathrm{E}_\theta \left[ \left(\sum_{i=1}^k u_i \partial_i \log f_{X\mid\Theta}(X\mid\theta)\right) \left(\sum_{j=1}^k u_j \partial_j \log f_{X\mid\Theta} (X\mid\theta)\right)\right] \\ = \mathrm{E}_\theta \left[ \left(\sum_{i=1}^k u_i \partial_i \log f_{X\mid\Theta}(X\mid\theta)\right)^2 \right] \geq 0 \, . $$

If this component wise notation is too ugly, note that the Fisher Information matrix $H=(I_{ij})$ can be written as $H = \mathrm{E}_\theta\left[S S^\top\right]$, in which the scores vector $S$ is defined as $$ S = \left( \partial_1 \log f_{X\mid\Theta}(X\mid\theta), \dots, \partial_k \log f_{X\mid\Theta}(X\mid\theta) \right)^\top \, . $$

Hence, we have the one-liner $$ u^\top H u = u^\top \mathrm{E}_\theta[S S^\top] u = \mathrm{E}_\theta[u^\top S S^\top u] = \mathrm{E}_\theta\left[|| S^\top u ||^2\right] \geq 0. $$

Zen
  • 21,786
  • 3
  • 72
  • 114
  • 4
    (+1) Good answer and welcome back, Zen. I was becoming concerned we might have lost you permanently given the length of your hiatus. That would have been a real shame! – cardinal Feb 14 '13 at 02:56
  • How about the quantum Fisher information, what would be the difference in the proof? – wondering Nov 09 '21 at 13:05
7

WARNING: not a general answer!

If $f(X|\theta)$ corresponds to a full-rank exponential family, then the negative Hessian of the log-likelihood is the covariance matrix of the sufficient statistic. Covariance matrices are always positive semi-definite. Since the Fisher information is a convex combination of positive semi-definite matrices, so it must also be positive semi-definite.

gusl
  • 101
  • 2