13

Consider a sample $X_1,X_2,\ldots,X_n$ from a univariate $N(\mu,\sigma^2)$ distribution where $\mu,\sigma^2$ are both unknown. Then it is known that under squared error loss, the sample variance $s^2=\frac1{n-1}\sum\limits_{i=1}^n (X_i-\overline X)^2$ is inadmissible for estimating $\sigma^2$ because there is a better estimator $\left(\frac{n-1}{n+1}\right)s^2=\frac1{n+1}\sum\limits_{i=1}^n(X_i-\overline X)^2$.

Now is this second estimator itself admissible under the same loss function? It certainly has the minimum risk among estimators of the form $cs^2$, but how do we know there isn't another estimator outside this class with a smaller risk?

I have the same question for when $\mu$ is known. If $\mu=0$, then it can be shown that $T=\frac1n\sum\limits_{i=1}^n X_i^2$ is not admissible under squared error loss for estimating $\sigma^2$ because there is a better estimator $\left(\frac{n}{n+2}\right)T=\frac1{n+2}\sum\limits_{i=1}^n X_i^2$. But I don't know if $\left(\frac{n}{n+2}\right)T$ is admissible or not.


I finally found an accessible reference for both these problems in Lehmann/Casella's Theory of Point Estimation (2nd ed, pages 330-334).

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
StubbornAtom
  • 8,662
  • 1
  • 21
  • 67

1 Answers1

8

Assume that $X_i \sim \mathcal{N}(\mu, \sigma^2)$ are i.i.d. with unknown $\mu$ and $\sigma^2$ and that the loss function is $\left(\delta(\mathbf{X}) - \sigma^2 \right)^2$. Consider the reference estimator $\delta_0(\mathbf{X}) = \sum_{i=1}^n (X_i-\bar{X})^2/(n+1)$.

Stein (1964) found an estimator $\delta_{(\nu)}$ which dominates $\delta_0$ for any fixed choice $\nu$. The estimator is $$\delta_{(\nu)}(\mathbf{X}) = \min\left\{ \delta_0(\mathbf{X}), \frac{1}{n+2} \sum_{i=1}^n (X_i - \nu)^2\right\}.$$ At a high level, this estimator chooses between an estimator with unknown mean (estimated with the sample mean) and an estimator which commits to the fixed $\nu$ as being the true mean and thus has one more df. Stein writes that ``It is interesting to observe that the estimator [$\delta_{(\nu)}$] may be obtained by first testing the hypothesis $\mu=\nu$ at an appropriate significance level and using the estimate [$\frac{1}{n+2} \sum_{i=1}^n (X_i - \nu)^2$] if the hypothesis is accepted and the estimate [$\delta_0(\mathbf{X})$] if the hypothesis is rejected.''

From my perspective, the estimator $\delta_{(\nu)}$ bets on $\nu$ being the true mean and is allowed to ``shirk'' out of the bet. This is a kind of post selection inference which is common in modern high dimensional statistics, but here there's no need to control for the selection.

Note, I was initially surprised to see this from Stein since in my experience, estimators for covariances are (sensibly) compared with scale invariant loss functions. On this point, in the linked article, Stein writes "I find it hard to take the problem of estimating a [variance] with quadratic loss function very seriously" and goes on to elaborate and conclude that "unlike the results of the present paper, the main results in [another paper which uses a squared error loss but for location estimation] can be seriously recommended to the practical statistician".


In the case that the true mean $\mu$ is known, the estimator with division by $n+2$ is admissible under squared error. This is reviewed in the above article from Stein and credited to Hodges and Lehmann (1951).

user257566
  • 724
  • 4
  • 14
  • 1
    I found a free version of the article: https://apps.dtic.mil/dtic/tr/fulltext/u2/1028390.pdf. – Dave May 10 '21 at 16:19
  • Super interesting answer! My immediate thought was that the scale invariant estimator would be admissible in this problem, by analogy with what happens with the location-invariant estimator of the mean in low dimensions, while in higher dimensions it would not be admissible. – guy May 10 '21 at 16:35
  • Thank you. I am tempted to ask if any estimator of the form $c\sum_i (X_i-\overline X)^2$ is admissible. And is there any discussion in this paper of the same problem for the multivariate normal? – StubbornAtom May 10 '21 at 16:56
  • @Dave That looks like a different article. – StubbornAtom May 10 '21 at 16:56
  • @StubbornAtom Irritatingly similar titles... – Dave May 10 '21 at 16:58
  • 1
    @StubbornAtom You've already found the best estimator of that form, and this shows its inadmissible. Therefore all such estimators are inadmissible. – user257566 May 10 '21 at 18:33
  • @guy If you're interested in further work, Maatta and Casella wrote a review article of this problem in Statistical Science, see https://projecteuclid.org/journals/statistical-science/volume-5/issue-1/Developments-in-Decision-Theoretic-Variance-Estimation/10.1214/ss/1177012263.full – user257566 May 10 '21 at 18:41
  • Following https://projecteuclid.org/euclid.bsmsp/1200500216, it seems like for $\mu=0$, the estimator $\frac1{n+2}\sum_i X_i^2$ is admissible under a scale invariant squared error loss. Will it be also admissible under squared error loss? – StubbornAtom May 10 '21 at 19:49
  • @StubbornAtom In one dimension with $p=1$ variable, the loss function $(s^2 - \sigma^2)$ and the scale invariant loss function $(s^2/\sigma^2-1)$ are equivalent. The remarks people make are focused on fundamental extensions of this problem to $p>1$ variables. In that case, the natural generalizations of both of these two loss functions are NOT scale invariant. However, Stein's loss function $tr(\Sigma^{-1} S) - \mathrm{log} \mathrm{det} (\Sigma^{-1} S) - p$ is scale invariant and typically the loss function of choice. – user257566 May 10 '21 at 20:40