Currently very confused in my stats class about what a biased estimator is. Does anyone know of a good and simple example of one that's easy to understand why it's biased and how to calculate the bias?
-
2Take $\delta(x_1,\ldots,x_n)=3$ as the estimator of the mean $\theta$ of the iid $x_i$'s. This estimator is constant, with mean $3$ and thus different from any $\theta\ne 3$. This difference means that $\delta$ is a biased estimator. – Xi'an Dec 15 '20 at 20:43
-
2Sample variance divided by n rather than n-1 is a rather classic example – astel Dec 15 '20 at 21:00
-
2Please add the [tag:self-study] tag & read its [wiki](https://stats.stackexchange.com/tags/self-study/info). Then tell us what you understand thus far, what you've tried & where you're stuck. We'll provide hints to help you get unstuck. Please make these changes as just posting your homework & hoping someone will do it for you is grounds for closing. – kjetil b halvorsen Dec 15 '20 at 21:18
5 Answers
There are many examples. Here is a nice one:
Suppose you have an exponentially distributed random variable with rate parameter $\lambda$ so with density $\lambda e^{-\lambda x}$ and expectation $\frac{1}{\lambda}$, and you want to estimate $\lambda$ from $n$ independent samples.
A natural estimator (and the maximum likelihood estimator) is $\hat\lambda = \dfrac{n}{\sum x_i}$ but this is biased.
When $n=1$ you have $\mathbb E\left[\frac1X\right]=\int\limits_0^\infty \frac{\lambda}x e^{-\lambda /x}\,dx =\infty$ and you cannot get much more biased than that. When $n\ge 2$ you get $\mathbb E\left[\hat \lambda \right] = \frac{n}{n-1} \lambda$, which is still biased though less so as $n$ increases.
One explanation of this is that its reciprocal $\frac{1}{\hat\lambda}=\frac{\sum x_i}{n}$ is an unbiased estimator of $\frac1\lambda$, since $\mathbb E\left[\frac{\sum X_i}{n}\right] = \frac1n \sum E\left[X_i\right] = \frac1\lambda$. Considering $\mathbb E\left[\frac{n}{\sum X_i}\right]$ is like taking the (larger) arithmetic mean when you really want to take the (smaller) harmonic mean. So it should not be a surprise that you get a result that is biased upwards

- 30,848
- 1
- 63
- 107
-
I, personally, think your comment "So it should not be a surprise that you get a result that is biased upwards", is weak. My answer actually presents a source that explains why a non-linear transformation can induce a bias, so it isn't a "surprise". – AJKOER Dec 16 '20 at 19:54
Let $X_1, ..., X_n\sim N(\mu, \sigma^2)$, then $\overline{X}$ is an unbiased estimator since $E(\overline{X}) = \mu$. Now take $T=\overline{X}+1$. Then $T$ is biased and the bias is equal to 1 (by the definition).

- 749
- 4
- 13
Perhaps the most common example of a biased estimator is the MLE of the variance for IID normal data:
$$S_\text{MLE}^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2.$$
This variance estimator is known to be biased (see e.g., here), and is usually corrected by applying Bessel's correction to get instead use the sample variance as the variance estimator.

- 91,027
- 3
- 150
- 376
A modern view of the properly biased estimator is a kernel-based system identification, also known as ReLS. See "A shift in paradigm for system identification" https://www.tandfonline.com/doi/pdf/10.1080/00207179.2019.1578407 and "Kernel methods in system identification, machine learning and function estimation: A survey" https://www.sciencedirect.com/science/article/abs/pii/S000510981400020X for more details.
Old techniques (which [most likely] you are being taught) have always been of a quite questionable value because of the lack of assured convergence and other problems.

- 1
Apparently, just taking the square root of the unbiased estimate for the sample variance is bias, as in statistical theory, the expected value of the proposed statistic should equal the true value.
Here is a confirming comment from Wikipedia, to quote:
In statistics, Bessel's correction is the use of ${n − 1}$ instead of ${n}$ in the formula for the sample variance and sample standard deviation,[1] where ${n}$ is the number of observations in a sample. This method corrects the bias in the estimation of the population variance. It also partially corrects the bias in the estimation of the population standard deviation.
And, to make matters worse, no closed-form solution either, also noted in a Wikipedia article:
However, for statistical theory, it provides an exemplar problem in the context of estimation theory which is both simple to state and for which results cannot be obtained in closed form. It also provides an example where imposing the requirement for unbiased estimation might be seen as just adding inconvenience, with no real benefit.
Now, to answer the question is to why, Wikipedia does provide an explanation, to continue quoting:
One way of seeing that this is a biased estimator of the standard deviation of the population is to start from the result that ${s^2}$ is an unbiased estimator for the variance ${σ^2}$ of the underlying population if that variance exists and the sample values are drawn independently with replacement. The square root is a nonlinear function, and only linear functions commute with taking the expectation. Since the square root is a strictly concave function, it follows from Jensen's inequality that the square root of the sample variance is an underestimate.
I hope this example helps.

- 1,800
- 1
- 9
- 9
-
2Unusually, Wikipedia fails us here with its careless equation of "unbiased" with "being an underestimate." The former refers to an expected value while the latter refers to a specific value of a statistic. The point is that even when you use an estimator that has a low bias, its particular value in a given case could still happen to be an *overestimate.* Because this distinction is fundamental to the question it would be well to heed it. – whuber Dec 15 '20 at 21:36