11

A pdf is usually written as $f(x|\theta)$, where the lowercase $x$ is treated as a realization or outcome of the random variable $X$ which has that pdf. Similarly, a cdf is written as $F_X(x)$, which has the meaning $P(X<x)$. However, in some circumstances, such as the definition of the score function and this derivation that the cdf is uniformly distributed, it appears that the random variable $X$ is being plugged into its own pdf/cdf; by doing so, we get a new random variable $Y=f(X|\theta)$ or $Z=F_X(X)$. I don't think we can call this a pdf or cdf anymore since it is now a random variable itself, and in the latter case, the "interpretation" $F_X(X)=P(X<X)$ seems like nonsense to me.

Additionally, in the latter case above, I am not sure I understand the statement "the cdf of a random variable follows a uniform distribution". The cdf is a function, not a random variable, and therefore doesn't have a distribution. Rather, what has a uniform distribution is the random variable transformed using the function that represents its own cdf, but I don't see why this transformation is meaningful. The same goes for the score function, where we are plugging a random variable into the function that represents its own log-likelihood.

I have been wracking my brain for weeks trying to come up an intuitive meaning behind these transformations, but I am stuck. Any insight would be greatly appreciated!

mai
  • 696
  • 4
  • 15
  • 4
    The notation may be confusing you. E.g., $F_X(X)$ is exactly as meaningful as applying *any* measurable function to $X$ would be. For a correct interpretation you will need to be very clear about what a [random variable is](https://stats.stackexchange.com/questions/50). For any random variable $X:\Omega\to\mathbb{R},$ the function $$Y:\omega\to F_X(X(\omega))$$ for $\omega\in\Omega$ clearly is a random variable and therefore has a distribution $F_Y.$ (Note the two distinct meanings of the symbol "$X$" in "$F_X(X)$.") $F_Y$ is uniform if and only if $X$ has a continuous distribution. – whuber Feb 27 '18 at 19:17
  • 1
    This isn't really a measure-theoretic issue: to understand it, you may safely ignore all references to "measurability." You might benefit from studying a little set theory early in your graduate career: that's where most people learn what this basic (and ubiquitous) mathematical terminology and notation really mean, so it's best not to put off learning it. – whuber Feb 27 '18 at 21:01
  • Maybe a word on why one should do a crazy thing like this: inserting a RV into its own density!!?! One example: say you want to estimate the density of X then you could measure how good you are by integrating over $f(x)-f_X(x)$ but this is “unfair”: you will never achieve good approximation when you do not have much data examples (I.e. the true density is small). Hence, a “fair” evaluation would be to weight the term by the true density. This is more or less the effect of inserting RV into their own densities... – Fabian Werner Feb 27 '18 at 21:31
  • See also https://stats.stackexchange.com/questions/324768/what-are-the-main-inequalities-used-in-statistical-proofs/324801#324801 – Fabian Werner Feb 27 '18 at 21:31

2 Answers2

13

A transform of a random variable $X$ by a measurable function $T:\mathcal{X}\longrightarrow\mathcal{Y}$ is another random variable $Y=T(X)$ which distribution is given by the inverse probability transform $$\mathbb{P}(Y\in A) = \mathbb{P}(X\in\{x;\,T(x)\in A\})\stackrel{\text{def}}{=} \mathbb{P}(X\in T^{-1}(A))$$ for all sets $A$ such that $\{x;\,T(x)\in A\}$ is measurable under the distribution of $X$.

This property applies to the special case when $F_X:\mathcal{X}\longrightarrow[0,1]$ is the cdf of the random variable $X$: $Y=F_X(X)$ is a new random variable taking its realisations in $[0,1]$. As it happens, $Y$ is distributed as a Uniform $\mathcal{U}([0,1])$ when $F_X$ is continuous. (If $F_X$ is discontinuous, the range of $Y=F_X(X)$ is no longer $[0,1]$. What is always the case is that when $U$ is a Uniform $\mathcal{U}([0,1])$, then $F_X^{-}(U)$ has the same distribution as $X$, where $F_X^{-}$ denotes the generalised inverse of $F_X$. Which is a formal way to (a) understand random variables as measurable transforms of a fundamental $\omega\in\Omega$ since $X(\omega)=F_X^{-}(\omega)$ is a random variable with cdf $F_X$ and (b) generate random variables from a given distribution with cdf $F_X$.)

To understand the paradox of $\mathbb{P}(X\le X)$, take the representation $$F_X(x)=\mathbb{P}(X\le x)=\int_0^x \text{d}F_X(x) = \int_0^x f_X(x)\,\text{d}\lambda(x)$$if $\text{d}\lambda$ is the dominating measure and $f_X$ the corresponding density. Then $$F_X(X)=\int_0^X \text{d}F_X(x) = \int_0^X f_X(x)\,\text{d}\lambda(x)$$ is a random variable since the upper bound of the integral is random. (This is the only random part of the expression.) The apparent contradiction in $\mathbb{P}(X\le X)$ is due to a confusion in notations. To be properly defined, one needs two independent versions of the random variable $X$, $X_1$ and $X_2$, in which case the random variable $F_X(X_1)$ is defined by$$F_X(X_1)=\mathbb{P}^{X_2}(X_2\le X_1)$$the probability being computed for the distribution of $X_2$.

The same remark applies to the transform by the density (pdf), $f_X(X)$, which is a new random variable, except that it has no fixed distribution when $f_X$ varies. It is nonetheless useful for statistical purposes when considering for instance a likelihood ratio $f_X(X|\hat{\theta}(X))/f_X(X|\theta_0)$ which 2 x logarithm is approximately a $\chi^2$ random variable under some conditions.

And the same holds for the score function$$\dfrac{\partial \log f_X(X|\theta)}{\partial \theta}$$which is a random variable such that its expectation is zero when taken at the true value of the parameter $\theta$, i.e.,$$\mathbb{E}_{\theta_0}\left[ \dfrac{\partial \log f_X(X|\theta_0)}{\partial \theta}\right]=\int \dfrac{\partial \log f_X(x|\theta_0)}{\partial \theta}f_X(x|\theta_0)\,\text{d}\lambda(x)=0$$

[Answer typed while @whuber and @knrumsey were typing their respective answers!]

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • Could you explain in words what is the meaning/interpretation of the statement $F_X(X_1)=P(X_2 \leq X_1)$? It still seems to me that saying "the cdf of a r.v. has a uniform distribution" does not make any sense. – mai Feb 27 '18 at 20:35
  • The cdf of a rv $F_X$ is not the same thing as the transform of a rv $X$ by the cdf of this rv, namely $F_X(X)$. – Xi'an Feb 27 '18 at 20:43
  • Yes, I agree that they are not the same thing. In the first instance it is not a r.v., while in the second case it is a r.v. Am I correct? – mai Feb 27 '18 at 21:34
  • Yes, which relates to the different meanings of $X$ in $F_X(X)$ – Xi'an Feb 27 '18 at 21:35
  • Could you explain what you mean by "expectation is zero **when taken at the true value of the parameter** $\theta$? It seems like $\theta$ is being treated as a variable here. What changes if $\theta$ is not at its "true value"? – mai Mar 01 '18 at 01:18
  • Never mind, I think I understand. The $\theta$ in the log-likelihood has to be the same $\theta$ as the $\theta$ in the pdf $f(x|\theta)$ (i.e., the "true value") in order for the pdfs to cancel properly after taking the expectation. – mai Mar 01 '18 at 03:24
9

Like you say, any (measurable) function of a random variable is itself a random variable. It is easier to just think of $f(x)$ and $F(x)$ as "any old function". They just happen to have some nice properties. For instance, if $X$ is a standard exponential RV, then there's nothing particularly strange about the random variable $$Y = 1 - e^{-X}$$ It just so happens that $Y=F_X(X)$. The fact that $Y$ has an Uniform distribution (given that $X$ is a continuous RV) can be seen for the general case by deriving the CDF of $Y$.

\begin{align*} F_Y(y) &= P(Y \leq y) \\ &= P(F_X(X) \leq y) \\ &= P(X \leq F^{-1}_X(y)) \\ &= F_X(F^{-1}_X(y)) \\ &= y \end{align*}

Which is clearly the CDF of a $U(0,1)$ random variable. Note: This version of the proof assumes that $F_X(x)$ is strictly increasing and continuous, but it's not too much harder to show a more general version.

knrumsey
  • 5,943
  • 17
  • 40
  • 1
    Your conclusion is incorrect for most strictly increasing $F_X$: you have assumed $F_X\circ F_X^{-1}$ is the identity--but that's not always the case. – whuber Feb 27 '18 at 19:19
  • Yes, thank you. The random variable $X$ clearly must be continuous. Am I missing anything now? – knrumsey Feb 27 '18 at 19:32
  • 1
    $F_X$ does not need to be bijective. Take, for example, the case where $X$ itself has a uniform distribution! The closure of the image of $F_X$ needs to be the entire interval $[0,1].$ That's essentially the definition of a continuous distribution. – whuber Feb 27 '18 at 19:45