2

Suppose I have two estimators, one is unbiased and another one is biased. But the biased one has smaller MSE(Mean Squared Error) than the unbiased one.

Can we figure out the better one in this case? If yes, then which one is the better estimator and why?

n.u2877
  • 31
  • 2
  • Better in what sense? MSE is a typical evaluation metric, so you have your answer if you want to judge based on MSE. – Dave Oct 23 '20 at 18:52
  • Overall, which one is better? – n.u2877 Oct 23 '20 at 18:55
  • 1
    "Best" has meaning only in terms of the criterion. If you prize unbiasedness over all else, then you will not favor a biased estimator with a smaller MSE, See my answer for an 'established' case of this. – BruceET Oct 23 '20 at 20:05

1 Answers1

2

Suppose you have a random sample with $n = 5$ observations from a normal distribution with unknown $\mu$ and $\sigma^2.$ In estimating $\sigma^2,$ the usual sample variance $V_1 = \frac{1}{n-1}\sum_{i=1}^n(X_i-\bar X)^2$ is unbiased for $\sigma^2:$ $E(V_1) = \sigma^2.$

By contrast, the maximum likelihood estimator of $\sigma^2,$ which is
$V_0 = \frac{1}{n}\sum_{i=1}^n(X_i-\bar X)^2,$ is biased, but has smaller MSE. [This is true for any $n,$ but I choose $n=5$ so that the bias of $V_0$ (negligible for large and moderate $n)$ will be unmistakable in my simulation.]

set.seed(2020)
m = 10^6;  n = 5;  mu = 100;  sg = 10
v1 = replicate(m, var(rnorm(n,mu,sg)))
v0 = (n-1)*v1/n 
mean(v0);  mean(v1)
[1] 79.95946  # aprx E(V0) < 100
[1] 99.94932  # aprx E(V1) = 100
mean((v0-sg^2)^2)
[1] 3606.298  # aprx MSE(V0) < MSE(V1) 
mean((v1-sg^2)^2)
[1] 5007.307  # aprx MSE(V1) = 5000

For $\sigma^2 = 100,$ we have $E(V_0) = 80, E(V_1) = 100.$ Also, $MSE(V_0) = 3200 + 400 = 3600 < MSE(V_1) = Var(V_1) = 5000.$

Histograms of v1 and v0:

enter image description here

par(mfrow = c(2,1))
 hdr1="Unbiased Sample Variance"
 hist(v1, br=30, prob=T, xlim=c(0,800), col="skyblue2", main=hdr1)
  abline(v=100, col="red", lty="dotted")
 hdr2="MLE of Population Variance" 
 hist(v0, br=30, prob=T, xlim=c(0,800), col="skyblue2", main=hdr2)
  abline(v=100, col="red", lty="dotted")
par(mfrow = c(1,1))

Note: A few authors have advocated use of the MLE, bias notwithstanding. However, traditional methods of inference for variances using the chi-squared distribution would have to be altered to use the MLE, and many statisticians believe underestimating $\sigma^2$ is a strong argument against the MLE. (Another complication is that dividing by $n+1$ results in an even greater decrease in MSE.)

BruceET
  • 47,896
  • 2
  • 28
  • 76