2

I've been reading Lehmann and Casella's Theory of Point Estimation 2nd Edition (TPE). In Chapter 1 Section 6 (pp.32-33), they introduce the idea "randomized estimator". Their explanation is, if $X \sim \mathcal{P} = \{ P_\theta \mid \theta\in \Theta \}$ and $T$ is sufficient for $\mathcal{P}$, since $T$ contains all the information on $\theta$, one can generate $X'$ only from the information $T=t$ such that $X$ and $X'$ have the same unconditional distribution, so do $\delta(X) $ and $\delta(X')$. They call $\delta (X')$ a randomized estimator.

I am struggling to make a sense of this statement. For example, $X_1,...,X_n \sim N(\mu, 1)$, iid, then $\bar{X}$ is sufficient for $\mu$. If we want to estimate $\mu$ by $\delta(X) = \bar{X}$, then $\bar{X} \sim N(\mu, n^{-1})$. Upon looking on the realization of the sufficient statistics, $\bar{X} = \bar{x}$, we can generate new sample by $X_1', \dots, X_n' \sim N(\bar{x}, 1)$. Then, the unconditional distribution of $\delta(X') = \bar{X}'$ is $N(\mu, (2n)^{-1})$, so the unconditional distributions do not coincide. What part I am doing wrong?

user789100
  • 55
  • 3

1 Answers1

3

Randomised estimators are defined in a more general setting than in this case and in particular need not be connected with sufficiency. In full generality, a randomised decision rule is a decision rule that returns a random decision for a given observation (or dataset). To reproduce the quote from Lehmann and Casella :

The starting point of a statistical analysis, as formulated in the preceding sections, is a random observable $X$ taking on values in a sample space $X$, and a family of possible distributions of $X$. It often turns out that some part of the data carries no information about the unknown distribution and that $A$ can therefore be replaced by some statistic $T = T (A)$ (not necessarily real-valued) without loss of information. A statistic $T$ is said to be sufficient for $A$, or for the family $V = \{P_\theta,\ \theta\in\Omega\}$ of possible distributions of $A$, or for $\theta$, if the conditional distribution of $A$ given $T = t$ is independent of $\theta$ for all $t$.

This definition is not quite precise and we shall return to it later in this section. However, consider first in what sense a sufficient statistic $T$ contains all the information about $\theta$ contained in $A$. For that purpose, suppose that an investigator reports the value of $T$, but on being asked for the full data, admits that they have been discarded. In an effort at reconstruction, one can use a random mechanism (such as a pseudo-random number generator) to obtain a random quantity $X'$ distributed according to the conditional distribution of $X$ given $t$. (This would not be possible, of course, if the conditional distribution depended on the unknown $\theta$.) Then the unconditional distribution of $X’$ is the same as that of $X$ , that is, $$ P_0 (X' \in A) = P_\theta (X \in A)\quad\text{for all }A, $$ regardless of the value of $\theta$. Hence, from a knowledge of $T$ alone, it is possible to construct a quantity $X'$ which is completely equivalent to the original $X$. Since $X$ and $X'$ have the same distribution for all $\theta$ , they provide exactly the same information about $\theta$ (for example, the estimators $\delta(X)$ and $\delta(X')$ have identical distributions for any $\theta$).

The estimator $\delta(X')$ is possibly random given the observed realisation $t$ of $T(X)$. Each time $\delta(X')$ is considered, unless $\delta(X')=\delta(X)$ with probability one, a different realisation occurs. This means that $\delta(X')$ is a random variable for the observed realisation of the original data $X$, rather than a deterministic value, which explains for the following quote where the notion of randomised estimator is introduced.

The construction of $X’$ is, in general, effected with the help of an independent random mechanism. An estimator $S(X')$ depends, therefore, not only on $T$ but also on this mechanism. It is thus not an estimator as defined in Section 1, but a randomized estimator. Quite generally, if $X$ is the basic random observable, a randomized estimator of $g(\theta)$ is a rule which assigns to each possible outcome $x$ of $X$ a random variable $Y(x)$ with a known distribution. When $X = x$, an observation of $Y(x)$ will be taken and will constitute the estimate of $g(\theta)$. The risk, defined by (1.10), of the resulting estimator is then $$ \int_{\mathcal X}\int_\mathcal{Y} L(\theta , y)dP_Y(y|X=x) dP_X(x;\theta, $$ where the probability measure in the inside integral does not depend on $\theta$.

In the special case of the Normal distribution proposed in the question,

  1. the new sample $(X_1^\prime,\ldots,X_n^\prime)$ is generated conditional on $T(X^\prime)=\bar X$ and therefore $\bar{X^\prime}=\bar X$. In particular, if $\delta(X)=\bar X$, then $\delta(X^\prime)=\bar X$
  2. the proposed estimator $\delta(X^\prime)$, namely the sample mean, is then non-randomised, which definitely cancels the appeal of the example! If instead, the median of the sample was considered as the estimator of the mean $\mu$, the estimator $\delta(X^\prime)$ would then be truly randomised, since the new normal sample $X^\prime$ would differ from $X$ and the median would remain random conditional on $\bar X$. (With expectation $\bar X$, though.)
Xi'an
  • 90,397
  • 9
  • 157
  • 575