-1

I would like to know that to determine Fisher information from the Likelihood model, why do we take the log of the likelihood first instead of using normal likelihood ?

GENIVI-LEARNER
  • 720
  • 4
  • 13

1 Answers1

0

The Fisher information uses the log likelihood by definition*. In that form you get such equations like:

A bound for the variance $\sigma_{\hat\theta}^2$ of the estimate $\hat\theta$ is (if $\hat\theta$ is normal distributed)

$$\frac{1}{\sigma_\hat{\theta}^2} = E\left({\frac{\partial^2}{\partial \theta ^2} \log \mathcal{L(x)}}\right)$$

*If you wish something more than just the definition. You might see this expression, using the logarithm, stemming from a derivation where we assume that $\hat{\theta}$ is approximately normal distributed (and that's where the logarithm enters the stage).

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
  • I didnt get the last part, why if $\hat{\theta}$ is approximately normal distributed then logarithm enters the stage? – GENIVI-LEARNER Apr 03 '20 at 12:07
  • The reason I asked this question is that I suspect here is more to the logarithm then what people are mentioning. One of the links Xi'an posted answers in detail but it merely provides the notion of "convenience" of calculation as in log domain multiplications are easy to handle. But I suspect that the reason both Shannon and Fisher information uses log is because it might has to do with combinations or parts of the base present. – GENIVI-LEARNER Apr 03 '20 at 12:14
  • 2
    When it's normal distributed$$f(\hat\theta)\sim exp((\hat\theta-\theta)^2/\sigma_\hat\theta)$$then you can connect the second derivative of the *logarithm* of the distribution of $\hat\theta$ to the frequency of the observations that lead to a particular observation/estimate of $\hat\theta$ (see also Fisher's [On the mathematical foundations of theoretical statistics](https://doi.org/10.1098/rsta.1922.0009)). – Sextus Empiricus Apr 03 '20 at 12:32
  • 2
    I would suggest that you remove the accepted mark from my answer for the moment. I personally feel that I have not yet given a sufficient intuitive view that would be a complete answer. Personally, I can follow the derivation of the bound, but I would like to turn it into a more clear picture, rather than a list of equations that leave one puzzled about the intuition how it got from one to the other place. – Sextus Empiricus Apr 03 '20 at 12:37
  • I am quite impressed with your morals. Honestly I marked it becuase I didnt expect the intuitive answer I was looking for from anyone else. whuber suggested something in the comments of [this post](https://stats.stackexchange.com/questions/289190/theoretical-motivation-for-using-log-likelihood-vs-likelihood) but it is left unattended. I also believe there is a more to what log transformation is measuring then what we see on the surface. Thinking about it the reason self information is merely negative log of the probability was bothering me. For me probability was itself a sufficient measure – GENIVI-LEARNER Apr 03 '20 at 12:57
  • (continued) but taking the log transformation of the probability measure reveals something. That something is what I was seeking intuitively. I suspect it measure the "complete parts" of the basis present the random variable. So essentially if all the parts are complete $-log_2 2 =1$ there is nothing missing, so the random variable cant inform anything. Now this is just speculation, so if we apply similar speculation to Fisher information it is measuring the missing information in $\theta$ – GENIVI-LEARNER Apr 03 '20 at 13:01