1

According to the definition of sufficiency, a statistic is sufficient for a parameter if the conditional distribution of $X$ given a value of statistic does not depend upon the parameter.

What I am trying to understand is how does conditional distribution of $X$ not being a function of the parameter fit into the intuitive meaning of sufficiency i.e statistic value holding the same amount of information as that of the respective sample.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467

2 Answers2

1

If the distribution of something you observe does not depend on a parameter, it cannot possibly give you information about it.

Now, if the distribution of $X$ depends on the parameter $\theta$ and the distribution of $X$ given the sufficient statistic $S$ does not, it must be the case that all information about $\theta$ is in $S$; once the value of $S$ is given, the value of $X$ becomes irrelevant, because the conditional distribution of $X$ no longer depends on $\theta$

F. Tusell
  • 7,733
  • 19
  • 34
1

We need some more notation. Suppose the random variable $X$ has a distribution from some family $f(x; \theta)$ parametrized by $\theta \in \Theta$. Suppose that $T=T(X)$ is a sufficient statistic (for $\theta$.) Then by the factorization theorem we have $$ f(x; \theta)= h(x) g(T(x); \theta) $$ where $h$ is a function not depending on $\theta$. Now, using the result from Can the Fisher factorization theorem be understood as a product of densities?, this can be interpreted as a factorization of the distribution of $X$, and we can use this to simulate from the distribution of $X$ by first simulating $T$ and then simulating from the distribution of $X \mid T=t$.

So after having observed $T(x)=t$, we can simulate surrogate data having the same distribution as $X$ by simulating from the above conditional distribution, which by sufficency do not depend on $\theta$. This is a way of giving intuitive meaning to sufficiency; knowing only $T(X)=t$ we can recreate by simulation surrogate data having the same distribution as $X$.

There are other ways to get intuitive meaning to $T$ having the same information content as $X$, via its use in inference. Without going into details

  • The mle (maximum likelihood estimator) of $\theta$ is a function of $T$ (or if nonunique, can be chosen in such way)

  • given a prior for $\theta$, the bayesian posterior will be a function of $T$

and there are many more general results of this sort. The two above should be easy exercises.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467