expected value of a score function (the gradient of the log-likelihood function)

Question

according to the Wikipedia: https://en.wikipedia.org/wiki/Score_(statistics), expected value of a score function should equals to zero and the proof is following:

\begin{equation} \begin{aligned} \mathbb{E}\left\{ \frac{ \partial }{ \partial \beta } \ln \mathcal{L}(\beta|X) \right\} &=\int^{\infty}_{-\infty} \frac{\frac{ \partial }{ \partial \beta } p(X|\beta)}{p(X|\beta)} p(X|\beta) dX \\ &= \frac{ \partial }{ \partial \beta }\int^{\infty}_{-\infty} p(X|\beta) dX = \frac{ \partial }{ \partial \beta } 100\% = 0 \end{aligned} \end{equation}

My question is why the probability density function of random variable $\frac{ \partial }{ \partial \beta } \ln p(X|\beta)$ is $p(X|\beta)$? Many thanks!!

Sextus Empiricus · Answer 1 · 2020-05-27T12:59:40.283

Method 1

You apply:

a rule for the derivative of a logarithm $$\frac{\partial}{\partial y} \log\left[f(x,y)\right] = \frac{\frac{\partial}{\partial y} f(x,y) }{f(x,y)}$$
And you apply:

that the expectation of any value derived from the data $T(X)$ $$E\left[T(X) \right] = \int_{-\infty}^\infty \overbrace{T(x)}^{\llap{\text{value of}}\rlap{\text{ $T(X)$ at point $x$}}} \underbrace{f(x)}_{\llap{\text{density of}}\rlap{\text{ r.v. $X$ at point $x$}}} dx$$ for example, you may be used for the equations of the n-th moment of $X$ $$E\left[X^n \right] = \int_{-\infty}^\infty x^nf(x) dx$$

So combining those two together:

$$\begin{array}{} E\left[\frac{\partial}{\partial \beta} \log\left[f(x,\beta)\right] \right] &=& E\left[ \frac{\frac{\partial}{\partial \beta} f(x,\beta) }{f(x,\beta)} \right]\\ &=& \int_{-\infty}^\infty \frac{\frac{\partial}{\partial \beta} f(x,\beta) }{f(x,\beta)} f(x,\beta) dx \\ &=& \int_{-\infty}^\infty \frac{\partial}{\partial \beta} f(x,\beta) dx \\ &=& \frac{\partial}{\partial \beta} \int_{-\infty}^\infty f(x,\beta) dx \\ &=& \frac{\partial}{\partial \beta} 1 \\ &=& 0 \end{array}$$

Method 2

And this relates to your question

You apply the transformation rule for a random variable say $g(t)$ is the distribution for $T$ and $f(x)$ is the distribution of $X$, then if $x=h(t)$ you transform from $f(x)$ to $g(t)$ by using $$g(t) = f(h(t)) \left| \frac{\partial h(t)}{\partial t} \right|$$

So the distribution function $g(t)$ of $T = \frac{\partial}{\partial y} \log\left[f(x,y)\right]$ is as following:

$$ g(t) = \text{I am here a bit stuck to express this but let's continue} $$ (it is not so easy)

And the expectation will be computed as:

$$E\left[ \frac{\partial}{\partial y} \log\left[f(x,y)\right] \right] = E\left[t \right] = \int_{-\infty}^{\infty} t g(t) dt$$

So with this second method you differentiate with $dt$ and not $dX$. And the derivation on Wikipedia is using method 1, where you should not interpret part of the integrand as the density of $T$ but as the density of $x$. You use $E\left[t \right] = \int_{-\infty}^{\infty} t g(t) dt$ or you use $E\left[t \right] = \int_{-\infty}^{\infty} t(x) f(x) dx$

score 1 · Accepted Answer · answered May 27 '20 at 09:31

Because your random variable is still $X$ with pdf $p(X|\beta)$.

When you take the expectation with respect to $X$ of $\frac{ \partial }{ \partial \beta } \ln \mathcal{L}(X|\beta)$ this is just a function of $X$, hence the pdf is $p(X|\beta)$.

I see it's also called "law of the unconscious statistician".

expected value of a score function (the gradient of the log-likelihood function)

2 Answers2

Method 1

Method 2