0

I am slightly confused about two versions of the idea that the score function has expected value of zero.

I learned that the score function is essentially the function of the slope of log likelihood. We would expect that the at the true parameter $\theta_0$, the slope is zero (i.e. the maximum log likelihood is achieved). It is written as $E[\sum_{i=1}^{n} \frac{\partial}{\partial \theta} l(\theta | x_i) |_{\theta = \theta_0}] = 0$, where $\theta_0$ is the true parameter of the distribution, and $l(\theta)$ is the log likelihood function.

The accepted answer here is an example of my understanding. Fisher's score function has mean zero - what does that even mean?

But I see in various places no effort to distinguish between a) the "true" parameter $\theta_0$ of the distribution, at which point we'd expect the log likelihood to be maximized, and b) just any parameter $\theta$ that we can analyze log likelihood at. Can someone help me reconcile this?

Here are examples of this:

Page 2 in the proof of the Cramer Rao inequality http://fisher.stats.uwo.ca/faculty/kulperger/SS3858/Handouts/Ch8-7and8-CramerRao-SufficientStats.pdf

Here, we are taking the derivative with respect to any possible parameter $\theta$, but somehow, the expected value of the unbiased estimator $T$ is not written differently, as $\theta_0$. It seems like the two are being confused with each other?

Also the same situation in this textbook: Cramer Rao Lower Bound Proof John Rice

Under the section Expected score is zero http://gregorygundersen.com/blog/2019/11/21/fisher-information/#expected-score-is-zero

First page http://cseweb.ucsd.edu/~elkan/291winter2005/lect09.pdf

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Snowball
  • 131
  • 4
  • 3
    I agree the distinction between the true $\theta$ and any $\theta$ should be clearer sometimes. But you can see that indeed in the proofs you mentioned, even if they write sometimes just $\theta$, it's actually always the true $\theta$, or more generally, irrespective of what is really "true" or not, the important thing is that it's **the same value of $\theta$** 1)that defines the distribution w.r.t. which the expectation is taken (so often we say it's the true one), AND 2) at which we evaluate the Fisher information. Otherwise both terms don't cancel out in the proof. – William de Vazelhes Mar 02 '21 at 13:35
  • 1
    The expectation $E$ in the property should be indexed by $\theta_0$, e.g., $E_{\theta_0}$. – Xi'an Mar 02 '21 at 14:37

0 Answers0