4

We seem to distinguish empirical estimates of parameters from 'true' values, and make comparisons between the two. I can understand what an empirical estimate is. What is a 'true' value?

For instance, my course notes have:

Definition 4.7 (Consistent estimator) An estimator $T(X_1,...,X_n)$ for a parameter $\theta$ is consistent if, for any $\epsilon>0$, we have $$ \lim_{n\rightarrow\infty}P(|T(X_1,...,X_n)-\theta|<\epsilon)=1 $$

How do we determine $\theta$ apart from by estimating it, in order to compare it to an estimator?

mjc
  • 265
  • 2
  • 9
  • Could you share some context? Where did you saw this term? It may be important for getting adequate answer. – Tim May 20 '20 at 12:30
  • I agree with Tim. Context will be crucial. Do you mean the $p$ parameter in a binomial or a Bernoulli distribution? – Dave May 20 '20 at 12:31
  • @Tim I've edited the OP to add an example. – mjc May 20 '20 at 12:35
  • @Dave See above. – mjc May 20 '20 at 12:35
  • 1
    A parameter is any definite quantitative property of the "box" that models a population or process. See https://stats.stackexchange.com/a/54894/919. – whuber May 20 '20 at 12:42
  • @whuber How does this sound? What I called the 'true' value is really the 'modelled' value. This is arbitrary in principle and usually empirically or theoretically determined in practice. If we want, we can assign a probability parameter of 0.01 to throwing heads on an unweighted coin. Once we've conducted a long experiment and observed empirically that it lands heads at a rate of around 0.5, we start to think 0.5 might be a better probability parameter to use, and tune our model to reflect observations. – mjc May 20 '20 at 13:06
  • 2
    @mjc the modeled value is always an estimator. What you describe, first assuming that the probability is $0.01$ and then later observing that $0.5$ is better, is the way the estimator tends probabilistically towards the true value. So anything that you use in practice is always (except simulation studies) an estimator. In your example you assumed that the coin ACTUALLY has a probability of $0.5$. That is the true value, you gave it by DEFINITION. In practice the true value can never be determined or observed, it can only be estimated. – LiKao May 20 '20 at 13:22
  • @LiKao So we have three values, I think. 1) The empirically observed value; 2) the theoretically modeled value; and 3) the unobserved and unobservable 'true' value, a property of physics under a suitable philosophy (e.g. Platonism). Would you agree? – mjc May 20 '20 at 13:28
  • 1
    @mjc Not quite. 3 is correct. But whenever you "observe" something, you have to make inferences. I can't "observe" the mean of a sample, you have to compute it from the data. Likewise, you can't "observe" the probability of a coin, you have to calculate it. So 1 and 2 should be the same thing. If you calculate the probability of a coin from observed tosses, you are using a model. So what you call "observation" (dividing heads by total tosses) is fitting a model. Any model should provide a consistent estimator, so 1 and 2 are just two different (hopefully) consistent estimators. – LiKao May 20 '20 at 13:34
  • @LiKao How's this? We can take a function $f$ (e.g. sample mean) of a dataset $d\subset D$ and make one of two uses of it: a) use the output of $f$ as a parameter $\theta_1$ in a model $M$; or b) use $f$ itself as an estimator $T$ for some other parameter, $\theta_2$. Used as an estimator, $f=T$ will be consistent i) with observations on the whole of $D$ iff $d$ happens to be perfectly representative of $D$; ii) with $\theta_2$ if $f=T$ converges to $\theta_2$ over a given (perhaps infinite) number of observations. – mjc May 20 '20 at 14:01
  • 1
    @mjc If $f$ is consistent with $\theta_1$ does not depend on $d$. Consistency is a theoretical property of an estimator, not one that depends on the sample $d\subset D$. That's why it must converge probabilistically, and not deterministically. If you get a bad sample, your estimate can be as much of as you like, it's just very improbable to get such a bad sample, if you have a lot of observations. As for the model: You mean a predictive model? The best predictive model should match the generating process. In that case $\theta_1=\theta_2$. For wrong models (i.e. all models): more complicated. – LiKao May 20 '20 at 20:55

2 Answers2

10

In general the "true value" is a fiction, defined within a model that in reality won't fit perfectly, in which case consequently there is also no such thing as a "true value". Assuming that there is a true parameter value is a device for doing theory and developing methods. It allows us to theoretically show that this-or-that method to estimate it has this-or-that property and works better or worse, which is a motivation for these methods even though it doesn't correspond exactly to the real situation (but then no model does).

If we simulate artificial data, however, we can fix and control the true parameter values, in which case we can compare the estimate to the true value (ignoring here potential issues with random number generation).

There are also some real situations in which true values can be "controlled" or known to some extent, for example if we have a measurement instrument that is meant to measure a certain quantity with stochastic measurement error, and in some situations we may be able to control the quantity that is measured. This still cannot guarantee the truth of the measurement error model within which the model parameter is defined, but at least we can control the real quantity that is interpreted to correspond to the true parameter value. An example for this are indirect estimates from age determination methods applied to individuals of which we know the precise age.

Sometimes we estimate parameters that correspond to existing population quantities of a usually big but finite population (such as population means of something that is well defined for all population members such as age) from a sample, in which case the population quantity would correspond to the "true" parameter value.

Christian Hennig
  • 10,796
  • 8
  • 35
3

It all boils down to theory vs. practice.

The true value of a parameter is always a theoretical quantity. Thus, you can never determine the true $\theta$.

The idea behind this is, that there is some kind of process, which generates the data. This process has some parameter. If you knew that parameter and the process you could generate data, which would be indistinguishable from the real process. Other parameters always generate data, which can eventually be distinguished from the real process.

The only way to "know" the true $\theta$ is if you define the process mathematically. For example, if you define a fair coin, then the true probability of showing heads is $0.5$, but not because you determine it by an experiment, but because you defined it to be.

A consistent estimator will tend (probabilistically!) towards the true value, which is captured by the definition you provided. As you collect more data, the probability that the consistent estimator differs by a certain amount $\epsilon$ from the true value goes towards zero (NOTE: it is a common misconception that probability of zero means something never happens, but that is not necessarily true. It only means, it practically never happens).

So if you want to "determine" a true value just based on the data alone, a consistent estimator is your best method. You can never be sure, that you really got the right value (which is why any estimator should come with confidence intervals, etc).

Now, how do you know an estimator is a consistent estimator if you can't know the true value in practice? If some estimator is consistent is not a practical observation, but a theoretical property. Thus, you prove (theory!) that something is a consistent estimator.

Take for example the expected value and variance (true parameters) of a normal distributed variable (the process). The mean of the samples is a consistent estimator of the expected value. So you define $X \sim Normal(\mu,\sigma^2)$ (i.e. $\theta = (\mu,\sigma)$), and then you prove (!) that the mean will tend probabilistically towards $\mu$ for any $\mu$ and $\sigma$. Then you know, that the method you have just proven can be applied in practice to estimate the true $\mu$.

LiKao
  • 2,329
  • 1
  • 17
  • 25