I came across a casual remark on The Chemical Statistician that a sample median could often be a choice for a sufficient statistic but, besides the obvious case of one or two observations where it equals the sample mean, I cannot think of another non-trivial and iid case where the sample median is sufficient.
-
1Did you mean to write "that a sample median could often be"? – Juho Kokkala Nov 06 '14 at 13:55
-
10It's an interesting question; the double exponential has the median for a ML estimator of its location parameter, but it's not sufficient. – Glen_b Nov 06 '14 at 15:57
-
honestly, i strongly feel that something is missing in this Q&A, how is it possible that a ML estimator is not sufficient for itself? sorry for just throwing my doubts this way, i was never really interested in sufficient statistics. – carlo Aug 15 '20 at 07:28
-
@carlo: what do you mean by "sufficient for itself"? – Xi'an Aug 15 '20 at 19:47
-
@Xi’an, I deleted my comment, but I’ll see if I can turn the ideas into a proper answer. – Matt F. Jun 29 '21 at 10:33
1 Answers
In the case when the support of the distribution does not depend on the unknown parameter θ, we can invoke the (Fréchet-Darmois-)Pitman-Koopman theorem, namely that the density of the observations is necessarily of the exponential family form, $$ \exp\{ \theta T(x) - \psi(\theta) \}h(x) $$ to conclude that, since the natural sufficient statistic $$ S=\sum_{i=1}^n T(x_i) $$ is also minimal sufficient, then the median should be a function of $S$, and the other way as well, which is impossible: modifying an extreme in the observations $x_1,\ldots,x_n$, $n>2$, modifies $S$ but does not modify the median. Therefore, the median cannot be sufficient when $n>2$.
In the alternative case when the support of the distribution does depend on the unknown parameter $θ$, I am less happy with the following proof: first, we can wlog consider the simple case when $$ f(x|\theta) = h(x) \mathbb{I}_{A_\theta}(x) \tau(\theta) $$ where the set $A_\theta$ indexed by $θ$ denotes the support of $f(\cdot|\theta)$. In that case, assuming the median is sufficient, the factorisation theorem implies that we have that $$ \prod_{i=1}^n \mathbb{I}_{A_\theta}(x_i) $$ is a binary ($0-1$) function of the sample median $$ \prod_{i=1}^n \mathbb{I}_{A_\theta}(x_i) = \mathbb{I}_{B^n_\theta}(\text{med}(x_{1:n})) $$ Indeed, there is no extra term in the factorisation since it should also be (i) a binary function of the data and (ii) independent from $\theta$. Adding a further observation $x_{n+1}$ which value is such that it does not modify the sample median then leads to a contradiction since it may be in or outside the support set, while $$ \mathbb{I}_{B^{n+1}_\theta}(\text{med}(x_{1:n+1}))=\mathbb{I}_{B^n_\theta}(\text{med}(x_{1:n}))\times \mathbb{I}_{A_\theta}(x_{n+1}) $$

- 90,397
- 9
- 157
- 575
-
-
2
-
1
-
A question from a less technically literate user: what is the takeaway from each of the two cases and the combined takeaway? – Richard Hardy Aug 11 '20 at 09:34
-
2You say *and the case is solved*. I am just wondering what the conclusion is. Is it that in this setting (support not depending on $\theta$), median is a sufficient statistic? Or is not a sufficient statistic? What about the second case? What do we learn about median as a sufficient statistic from the two cases taken together? I am not questioning the technical points you make, but I am trying to extract a takeaway message, an answer to the question you raise in the OP. You may think everything is obvious from what you have already written, but it might not be obvious for everyone (e.g. me). – Richard Hardy Aug 13 '20 at 07:05
-
Also, could you please add @RichardHardy in the comments to me? I did not get a notification of your last comment. I came back because I remembered I had posted a comment and I discovered there was an answer already. Thank you! – Richard Hardy Aug 13 '20 at 07:07
-
The answer to the question is that the median cannot be a sufficient statistic, except in trivial cases (like $n=1$). – Xi'an Aug 13 '20 at 07:08
-
1