8

Let $X_1,...,X_n$ be an i.i.d. sample with parameter $\theta$ and $T$ a statistics. The statistics is called sufficient if, given a value $t$, the distribution $P_{\theta}(X_1,..,X_n|T=t)$ does not depend on $\theta$.

But what is the range of $t$ ? Do we need also that the event $P_{\theta}(T=t)>0$ for every value of $\theta$ in order to be sure that we are not conditioning on null events ? Do we need that $Supp_{\theta} f_T$ is the same for every value of $\theta$ ? Does it make sense this doubt ?

ps: the question is posed in the setting of discrete distribution.

Thomas
  • 623
  • 3
  • 14
  • As a counter-example, consider the Uniform distribution on $\{1,2,\ldots,\theta\}$ and the use of the sufficient statistic $T_n=\max_{1\le i\le n} X_i$. – Xi'an Oct 24 '19 at 13:03
  • Exactly. In your second example, given $t$, $P_{\theta}(X_1,...,X_n|T=t)$ cannot be evaluated for every value of $\theta$: we need $\theta \ge t$ for the event $P_{\theta}(T=t)$ to have non null probability. Isn't there a problem there? What am I missing ? The statistics you proposed is supposed to be sufficient. – Thomas Oct 24 '19 at 13:28

2 Answers2

3

There is no impact on sufficiency of the fact that $\mathbb{P}_\theta(T=t)$ is not always positive. As indicated in my example, the statistic $$T=X_{(n)}=\max_{1\le i\le n} X_i$$ is sufficient when the $X_i$'s are uniform on $\{1,\ldots,\theta\}$. And $$\mathbb{P}_\theta(T=t)=0$$ when $\theta<t$. The joint probability mass function of $(X_1,\ldots,X_n,X_{(n)})$ at $(x_1,\ldots,x_n,x_{(n)})$ is also zero when $x_{(n)}>\theta$, hence the conditional probability of $(X_1,\ldots,X_n)$ given $X_{(n)}=t$ and $t>\theta$ is not defined. But since the conditional distribution of $(x_1,\ldots,x_n)$ given $X_{(n)}=t$ is uniform over $$\{(x_1,\ldots,);\ x_{(n)}=t\}$$ independently of $\theta\ge t$ (and not defined otherwise), this does not impact sufficiency.

I presume that the undefined nature of the conditional in the impossible situation that $X_{(n)}=t$ and $t>\theta$ appears to bring some dependence on $\theta$ but this is not a correct impression: the conditional is not defined because the conditioning event is impossible. The part that brings information on $\theta$ is $X_{(n)}$, which is fine since it is sufficient.

Here is a quote from one of the earlier papers on the topic, by Koopman (1935):

enter image description here

where he similarly sees no impact on sufficiency in the fact that the density may be null for the actual observations and some values of the parameter.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • I was reading again this answer but I do not understand the conclusion. Are you saying that in the standard definition we should add "T is sufficient for theta if P_{theta}(X|t) does not depend on theta, considering all thetas for which the event T(x)=t has nonzero probability" ? In this case, isn't it something important missing in standard textbooks (e.g. Casella & Berger ) ? As you noticed, this situation appears in one of the first examples of sufficient statistics... – Thomas Dec 31 '21 at 13:56
  • I understand that in your example the fact that, even if the sufficient statistics t fixes a range of possible parameters theta over which the conditional density is defined, this does not help to gain more information from the sample about theta which is additional to the knowledge obtained by knowing just t. But I am more focusing on the applicability of the formal definition. – Thomas Dec 31 '21 at 14:01
1

In advanced probability texts, sufficiency is usually defined formally in terms of partitions on the sample space, and then we build up the standard definition as an implication of this for a parametric model. In any case, once you translate to the common definition, the requirement for sufficiency is that this condition should hold for all $t$. So a statistic $T: \mathbb{R} \rightarrow \Lambda$ will be sufficient for $\theta$ if and only if it has a conditional probability function $P_{\theta}(\mathbf{X}|T =t)$ satisfying:

$$P_{\theta}(\mathbf{X}|T=t) = P_{\theta'}(\mathbf{X}|T=t) \quad \quad \quad \text{for all } \theta, \theta' \in \Theta \text{ and } t \in \Lambda.$$

Note that in cases where $\mathbb{P}_\theta(T=t)=0$ for some $t$, the conditional probability is defined through the law of total probability --- i.e., it is any measureable non-negative function satisfying:

$$\int \limits_\Lambda P_{\theta}(\mathbf{X} \in \mathcal{S}|T=t) dF_T(t) = \mathbb{P}_{\theta}(\mathbf{X} \in \mathcal{S}) \quad \quad \quad \text{for all } \theta \text{ and measureable } \mathcal{S}.$$

This latter part gives you some "wiggle room" for sufficiency in regard to sets with probability measure zero. If there is any conditional probability function satisfying the above then we would say that the statistic $T$ is sufficient.


If you would like to know more about the underlying definition, I recommend you have a look at the definition of a sufficient partition first. Most textbooks on intermediate or advanced probability will have an explanation of sufficiency that is couched in an initial definition in terms of a partition on the sample space.

Ben
  • 91,027
  • 3
  • 150
  • 376
  • Thanks. I see I think where you are going. But saying "a subspace that occurs with probability one" is a bit undefined no ? According to the value of $\theta$ we have different induced probabilities on $\Lambda$. – Thomas Oct 24 '19 at 13:43
  • Ben, I do not understand the answer as it does not address the issue of the support of the statistic $T$ _depending on the parameter_ $\theta$. This does not seem to relate with the statistic $T$ being defined up to _a set of measure zero._ – Xi'an Oct 27 '19 at 08:15
  • Reading again this answer. Casella & Berger cite some partitions of the sample space, a concept that to me is very similar to the sigma algebra generated by the statistics. But they use it more to develop the intuition (e.g. when they introduce the concept of minimal sufficient statistics, that has the coarsest induced sigma algebra among all sufficient statistics ) rather than for the starting formal definition, which is the usual one that I reported. Do you have a different reference ? – Thomas Jan 04 '22 at 16:38
  • @Xi'an: I have edited the answer to add information about the meaning of the conditional probability function, which I think now answers the substantive query about dealing with sets of measure zero. – Ben Jan 04 '22 at 21:57
  • thanks for your update. Just a notational thing. In the first equation isn't it weird to have a condition $\theta \ne \theta'$ on a relation that is trivially true when $\theta=\theta'$ ? – Thomas Jan 05 '22 at 15:27
  • @Thomas: Yes, fair point. Edited. – Ben Jan 05 '22 at 22:57