What makes constant function an estimator?

Question

This is a theoretical one. This question is inspired by recent question and discussion on bootstrap, where a constant estimator, i.e. a constant function

$$f(x) = \lambda$$

was used as an example of estimator to show problems with estimating bias using bootstrap. My question is not if it is "good" or "bad" estimator, since it is independent of data and so it has to be poor. However, while I agree with the definition that Larry Wasserman gave in his handbook "All the Statistics":

A reasonable requirement for an estimator is that it should converge to the true parameter value as we collect more and more data. This requirement is quantified by the following definition:
6.7 Definition. A point estimator $\hat{\theta}_n$ of a parameter $\theta$ is consistent if $\hat{\theta}_n \overset{P}{\rightarrow} \theta$.

then what bothers me is that $\hat{\theta}_n$ estimated using a constant function does not approach $\theta$ even with $n \rightarrow \infty$, since it is constant.

So my questions are: What makes constant function an estimator? What justifies it? What are its properties? What are the similarities between constant function and other estimators? Could you also provide some references?

Re references: Jack Carl Kiefer, *Introduction to Statistical Inference* (Springer-Verlag 1987) includes two exercises, numbers 4.1 and 4.2, which ask the reader to compare several point estimators. In both cases two of those estimators are constant. — whuber, Dec 20 '14 at 00:54
Clearly a constant estimator is not generally *reasonable*, and so should not be expected to satisfy all the requirements we'd like estimators to satisfy to call them *reasonable*. It is, nevertheless, still an estimator. — Glen_b, Dec 20 '14 at 01:07
From a mathematical viewpoint, estimators and statistics are (measurable) transforms of the sample vector. This is how they are defined. To exclude those, you have to ressort to decision theory (adminissibility, minimaxity, &tc.) or asymptotics (convergence, consistency) in order to compare and sometimes order estimators. — Xi'an, Dec 20 '14 at 08:23

dimitriy · Answer 1 · 2014-12-20T00:35:49.683

5

An estimator is simply some function of a potential sample of data that seeks to estimate an unknown population parameter. It's a recipe or a formula. Your constant is an estimator that does not depend on the data at all: the estimate that is produces will always be the same.

There's an infinite number of estimators, and most of them are "bad". What does that mean? Estimators have desirable properties, which leads them to produce "good" estimates under certain conditions. Some of these are

Computational cost
Unbiasedness
Consistency
Efficiency
Robustness (insensitivity to violations of the assumptions under which the estimator retains its desirable properties)

These goals are often at odds with each other. The constant has the lowest computational cost, but arguably none of the others.

edited Dec 20 '14 at 00:35

answered Dec 19 '14 at 22:59

dimitriy

31,081
5
63
138

sufficiency too! – mugen Dec 19 '14 at 23:17
3

Estimators do not need to be "algebraic" functions. Indeed, many are not. The bullet list of desirable properties is missing the most important one, both theoretically and practically: the estimator's *risk* (expected loss conditional on the parameter values). – whuber Dec 20 '14 at 00:23
Unbiasedness is a rare property in that most transforms of a parameter $\theta$, $h(\theta)$, do not allow for the existence of an unbiased estimator for all values of $\theta$. – Xi'an Dec 21 '14 at 20:59

score 1 · Accepted Answer · answered Dec 21 '14 at 11:43

I think it's not so much a question of ''what makes constant function'' an estimator but ''what makes estimator an estimator''. First, from mathematical point of view an estimator is a function of special kind, it's a random variable, that fulfills some requirements - it's a statistic, which means it has to be independent from $\theta$ (its ''estimand''). Constant function is independent of $\theta$, (it's independent of anything :).

Example. $T =\bar{X}$ is a statistic of $\mu$, and $S=\bar{X}-\mu$ is not a statistic of $\mu$ ('because it's dependent on $\mu$ itself).

So, a constant function is an object that possess these two qualities, that justify calling it ''an estimator''.

The quite important thing is that what we desire is not "any" estimator. Any estimator may be biased, which means that with every sample we obtain it adds or subtracts something. F.e. you want your bathroom scale to show your weight exacly as it is (or maybe women more tend to cheat themselves:).

We want an estimator that will minimize Mean Square Error ($MSE=E(\hat{\theta}-\theta)^2$). But there is no one estimator, that minimize this error - it's a family of such estimators. So which one is the best one? A good estimator is the one, that fulfills some requirements. I know about three of them:

unbiasedness - it does not adds or subtracts anything. Mathematically it's $E\hat{\theta} = \theta$
consistency (this what is written in your book)
maximal efficiency which refers to estimator's variance - we want as small variance as possible.

Someone wrote about computational cost, but it's not mathematical/probabilistic issue.

Thus, a constant function actually is an estimator, however is not desired one, 'cause at least it's biased and not consistent (as you noticed). These are differences between constant function and other (good) estimators.

This is more or less my answer to your question. I think, going further will make us dig in some mathematical equations to show more differences or similarities, etc.

score 0 · Answer 3 · edited Apr 13 '17 at 12:44

0

Constant estimators/predictors have a use as benchmarks against which one judges the performance of "proper" estimators/predictors.
A standard example is in the context of binary logistic regression, where we attempt to estimate conditional probabilities, exploiting the information that possibly resides in the regressors in order to predict better, in some sense, the probability related to the dependent variable,

$$P(Y_i=1 \mid \mathbf x_i) = \Lambda(g(\mathbf x_i'\beta))$$

where $\Lambda()$ is the Logistic cumulative distribution function, and $g(\mathbf x_i'\beta)$ is the logit.

But since we have the sample available, we can also very cheaply estimate the unconditional probability, $$\hat P(Y=1) = \frac 1n \sum_{i=1}^n y_i$$

We can then compare the predictive performance of $\hat P(Y_i=1 \mid \mathbf x_i) = \Lambda(g(\mathbf x_i'\hat \beta))$ against the "naive" (and constant) estimator $\hat P(Y=1)$. The former should do better, otherwise all the trouble we went into trying to use the information about the probability of $Y$ included in the $X$'s did not pay off.

A CV thread exactly on this issue can be found here (look also at the comments).

edited Apr 13 '17 at 12:44

Community

1

answered Dec 20 '14 at 00:16

Alecos Papadopoulos

52,923
5
131
241

4

$\hat P(Y=1)$ is far from constant! It clearly depends on the data. A *constant* estimator, as Dimitriy Masterov points out, is independent of the data. I do applaud your effort to find a practical use for constant estimators. One interesting one might be, when faced with Bernoulli$(p)$ data you mistrust, to resort to the estimate $p=1/2$ no matter what. ("I have so little information that I'm just going to assume this process has a 50-50 chance of either outcome.") – whuber Dec 20 '14 at 00:24
@whuber It certainly depends on the data, and this is clearly shown in my answer. But it is _used_ as a constant, in the specific evaluation exercise I described, doesn't it? Since we obtain just one point-estimate over the sample. – Alecos Papadopoulos Dec 20 '14 at 00:28
5

You are missing the entire point of the question, then: it concerns estimators that *do not depend on that data at all.* They are constant regardless of what data are collected. *Every* (non-randomized) estimator is constant conditional on the data! – whuber Dec 20 '14 at 00:30
@whuber Well then I don't also understand the following: we obtain estimates of the betas, which indeed are single point estimates. But then we obtain many conditional probability estimates, one for each observation, since the subset of the date changes for each one. And then we have only one estimated unconditional probability to compare them, because _it_ depends on the whole data set. So the estimated probability remains constant with respect to the whole data set, while the conditional probabilities don't. How could we call this? – Alecos Papadopoulos Dec 20 '14 at 00:36
3

There's no need to discuss such complications. By definition, any definite procedure that (a) produces a number which (b) is intended to reflect a true underlying value is an *estimator*. When it (c) has the potential to vary from one dataset to the next, it is not constant. The *only* constant estimators are those of the form "ignore the data and declare that the true underlying value is $\lambda$", where $\lambda$ is some real number. It sounds like you are using "constant" in a very different, completely unrelated sense. – whuber Dec 20 '14 at 00:47
@whuber Hmm, yes, I have to rethink about that, thanks. – Alecos Papadopoulos Dec 20 '14 at 00:48
@whuber I regret that I can learn about your comments only from the 'comments' since all of them are very interesting and seem to answer the question, but by their narrow (comment) nature it seems I'll have to wait patiently for more discussion and answers you could comment ;) – Tim Dec 20 '14 at 07:46

What makes constant function an estimator?

3 Answers3