Why is the Fisher Information matrix positive semidefinite?

Question

Let $\theta \in R^{n}$. The Fisher Information Matrix is defined as:

$$I(\theta)_{i,j} = -E\left[\frac{\partial^{2} \log(f(X|\theta))}{\partial \theta_{i} \partial \theta_{j}}\bigg|\theta\right]$$

How can I prove the Fisher Information Matrix is positive semidefinite?

Isn't it the expected value of an outer product of the score with itself? — Neil G, Feb 13 '13 at 21:26
@NeilG Not necessarily. People can define Fisher's information as the expectation of the Hessian matrix of the log-likelihood function. Then, only under "certain regularization conditions", we have Fisher's information equal to the variance of the score vector (gradient of log-likelihood function). — Tan, Feb 27 '22 at 19:58

Zen · Accepted Answer · 2019-03-06T21:39:25.760

Check this out: http://en.wikipedia.org/wiki/Fisher_information#Matrix_form

From the definition, we have

$$ I_{ij} = \mathrm{E}_\theta \left[ \left(\partial_i \log f_{X\mid\Theta}(X\mid\theta)\right) \left(\partial_j \log f_{X\mid\Theta}(X\mid\theta)\right)\right] \, , $$ for $i,j=1,\dots,k$, in which $\partial_i=\partial /\partial \theta_i$. Your expression for $I_{ij}$ follows from this one under regularity conditions.

For a nonnull vector $u = (u_1,\dots,u_k)^\top\in\mathbb{R}^n$, it follows from the linearity of the expectation that $$ \sum_{i,j=1}^k u_i I_{ij} u_j = \sum_{i,j=1}^k \left( u_i \mathrm{E}_\theta \left[ \left(\partial_i \log f_{X\mid\Theta}(X\mid\theta)\right) \left(\partial_j \log f_{X\mid\Theta}(X\mid\theta)\right)\right] u_j \right) \\ = \mathrm{E}_\theta \left[ \left(\sum_{i=1}^k u_i \partial_i \log f_{X\mid\Theta}(X\mid\theta)\right) \left(\sum_{j=1}^k u_j \partial_j \log f_{X\mid\Theta} (X\mid\theta)\right)\right] \\ = \mathrm{E}_\theta \left[ \left(\sum_{i=1}^k u_i \partial_i \log f_{X\mid\Theta}(X\mid\theta)\right)^2 \right] \geq 0 \, . $$

If this component wise notation is too ugly, note that the Fisher Information matrix $H=(I_{ij})$ can be written as $H = \mathrm{E}_\theta\left[S S^\top\right]$, in which the scores vector $S$ is defined as $$ S = \left( \partial_1 \log f_{X\mid\Theta}(X\mid\theta), \dots, \partial_k \log f_{X\mid\Theta}(X\mid\theta) \right)^\top \, . $$

Hence, we have the one-liner $$ u^\top H u = u^\top \mathrm{E}_\theta[S S^\top] u = \mathrm{E}_\theta[u^\top S S^\top u] = \mathrm{E}_\theta\left[|| S^\top u ||^2\right] \geq 0. $$

(+1) Good answer and welcome back, Zen. I was becoming concerned we might have lost you permanently given the length of your hiatus. That would have been a real shame! — cardinal, Feb 14 '13 at 02:56
How about the quantum Fisher information, what would be the difference in the proof? — wondering, Nov 09 '21 at 13:05

score 7 · Answer 2 · answered Feb 14 '13 at 05:03

WARNING: not a general answer!

If $f(X|\theta)$ corresponds to a full-rank exponential family, then the negative Hessian of the log-likelihood is the covariance matrix of the sufficient statistic. Covariance matrices are always positive semi-definite. Since the Fisher information is a convex combination of positive semi-definite matrices, so it must also be positive semi-definite.

Why is the Fisher Information matrix positive semidefinite?

2 Answers2

Linked

Related