2
  1. The definition of complete statistics is from http://en.wikipedia.org/wiki/Completeness_(statistics)#Definition

    The statistic $s$ is said to be complete for the distribution of $X$ if for every measurable function $g$ (which must be independent of $θ$) the following implication holds:

    $E(g(s(X))) = 0$ for all $θ$ implies that $P_θ(g(s(X)) = 0) = 1$ for all $θ$.

    Let the codomain of statistic $s$ be $\mathbb R^m$, then is $g$ a measurable mapping from $\mathbb R^m$ to $\mathbb R$?

    Since $g$ acts on the codomain of $s$, does $g$ not know the sample size $n$ of $X = \{X_1, \dots, X_n\}$? Only $s$ acts on the sample $X$,so does only $s$ know the sample size of $X$?

    But in a solution to the problem 6.15 in Casella and Berger's Statistical Inference, when proving a statistic is not complete, $g$ is chosen to depend on $n$.

    what is $g$ actually? A measurable mapping from $\mathbb R^m$ to $\mathbb R$ which doesn't know the sample size, or something like a statistic which knows the sample size?

    Or does the statistic $s$ also output the sample size $n$ of its input $X$? I.e. the codomain of a statistic $s$ is $\mathbb R^m \times \mathbb N$? So that $g$ is defined on $\mathbb R^m \times \mathbb N$, i.e. $g$ get to know the sample size from the output of statistic $s$?

  2. In general (going beyond the concept of complete statistics), when talking about a mapping $g$ on a statistic $s(X)$, i.e. $g(s(X))$, do we always assume $g$ knows the sample size of $X$, i.e. is the sample size of $X$ always an input to $g$?

    Even further do we assume $g$ know the entire input sample $X$ (not just its size $n$) to the statistic $s$? e.g. $s(X) = \sum_i X_i$, and does it make sense that $g(s(X)) = (\sum_i X_i) + (\sum_i X_i^2)$?

See also https://math.stackexchange.com/questions/918632/in-composition-of-two-mappings-can-the-outer-mapping-access-the-arguments-of-th

Thanks.

Tim
  • 1
  • 29
  • 102
  • 189

1 Answers1

3

Things are clearer if one thinks of completeness as a property of a parametric family of distributions $$ \mathcal F = \{F_\theta: \theta \in \Theta\}, $$ where $F_\theta$ is a probability distribution on (say) $\mathbb R^m$. Then $\mathcal F$ is complete if $$ \int g(x) \ dF_\theta(x) = 0 \iff F_\theta(g(x) = 0) = 1 \mbox{ for all $\theta$}. $$ When one talks about a statistic $s(X)$ being sufficient, where $X$ is an $\mathbb R^n$-valued random vector and $X \sim F^n_\theta$ (i.e. $X_i \stackrel{iid}{\sim} F_\theta$) for some unknown $\theta$, what one really means is that the parametric family $\{G_\theta: \theta \in \Theta\}$ is complete where $G_\theta$ is the distribution of $s(X)$ when $X \sim F^n_\theta$.

As such, then, the notion of completeness has nothing to do with the sample size and the function $s(X)$ needed to get a complete statistic might change with the sample size. For example if $X_i \stackrel{iid}{\sim} N(\mu, 1)$ then $s(X) = \sum_{i=1}^n X_i$ depends on the sample size. What the solution to 6.15 shows is that, irrespective of the value of $n$, the parametric family of distributions of the statistic $(\bar X, S^2)$ does not correspond to a complete family. A different $g$ is required for each value of $n$, but for each value of $n$ we know such a function exists so the statistic is not complete for any sample size.

The function $g$ is just a mapping from the space $\mathbb R^m$ to the real numbers. $g$ doesn't "know" anything other than its input. But, again, the function $g$ needed to show incompleteness depends on the distribution of $s(X)$ and $s(X)$ itself may depend on the sample size. Indeed, $s: \mathbb R^n \to \mathbb R^m$ so $s$ itself obviously must depend on the sample size.

guy
  • 7,737
  • 1
  • 26
  • 50
  • Thanks. (1) in your first paragraph, can you describe in words, what a family of distribution being complete mean? Looking at your formula, i seem to get the idea but not really. (2) "the function s(X) needed to get a complete statistic", do you mean "the function s(X) needed to get a non-complete statistic" instead? (3) According to the definition of complete statistics, does it make sense to say a statistic is complete for some sample size(s), bot not complete for the other sample size(s)? – Tim Sep 04 '14 at 14:12
  • @Tim (1) A family is complete if the only unbiased estimator of $0$ is (almost surely) $0$; if $X \sim F_\theta$ where $\{F_\theta\}$ is complete then the only way to estimate $0$ unbiasedly is with $g(X) = 0$ almost surely. (2) When $X = (X_1, \ldots, X_n)$ has iid components, $X$ itself is not complete because $E[X_1 - X_2] = 0$. However, I might introduce a function $s(\cdot)$ so that $s(X)$ is complete; if I *also* choose $s(\cdot)$ so $s(X)$ is sufficient then lots of good things happen. – guy Sep 04 '14 at 16:38
  • @Tim A statistic is defined by a mapping $s: \mathbb R^n \to \mathbb R^m$. If the sample size changes the domain of $s$ also changes, so from a technical point of view, $s$ depends on the sample size. If I change the sample size, the domain of the function changes, so a statistic at sample size $n$ can't even be used as a statistic at a different sample size because it has the wrong domain. People often treat things like $\bar X$ as a statistic without referencing the sample size, but strictly speaking this is an abuse of notation. Really, we have a family of statistics $s_n$ as $n$ changes. – guy Sep 04 '14 at 16:41
  • thanks. (1) Why do we care unbiased estimator of seemingly uninteresting 0? Is it only because adding an unbiased estimator of 0 doesn't change unbiasedness of an unbiased estimator of anything else? (2) Why do we care whether unbiased estimators of 0 are unique (i.e. 0 a.s.)? (3) It seems that I have been bothered by the interpretation of complete statistics in Wikipedia for more than a year. You reply has shed a different perspective from the distributions instead of random variables. I also appreciate if you could also shed some enlightenment here http://stats.stackexchange.com/q/53107/1005 – Tim Sep 04 '14 at 16:50
  • Thanks! "if X is complete and sufficient", what does it mean that a random variable is sufficient? I understand that sufficiency is for a statistic, and haven't heard that it can be for the sample? – Tim Sep 05 '14 at 01:30
  • @Tim If I observe $X_1, \ldots, X_n$ iid then the whole sample $X = (X_1, \ldots, X_n)$ is a sufficient statistic, but is not complete since $E[X_1 - X_2] = 0$. In my previous comment pertaining to completeness, I'm letting $X$ be an arbitrary statistic rather than the entire sample. – guy Sep 05 '14 at 02:06