7

Let $X_1,...,X_n$ be a random sample on $\text{Uniform}(\theta -1/2, \theta +1/2).$ I need to find a confidence interval for $\theta$ with ($1-\alpha$) of confidence.

I have this:

$\max(X_i)-1/2<\theta<\min(X_i) +1/2,$ hence I say $\mathbb{P}(\max (X_i)-1/2<\theta<\min(X_i) +1/2)\ge 1-\alpha $

or using the statistics $\min(X_i) , \max(X_i),$

$\mathbb{P}(\min(X_i) \le\theta\le\max(X_i))\ge 1-\alpha$
$\mathbb{P}(\theta\le \max(X_i)) -\mathbb{P}(\min(X_i) \le\theta)$ $\ge 1-\alpha. $

BruceET
  • 47,896
  • 2
  • 28
  • 76
Hug ma
  • 73
  • 2
  • The interval that you are proposing does not depend on $\alpha$ – Misius Jun 14 '21 at 20:49
  • A continuation of https://math.stackexchange.com/questions/4172924/interval-estimator-for-uniform – Henry Jun 14 '21 at 20:50
  • https://math.stackexchange.com/q/3230142/321264 – StubbornAtom Jun 15 '21 at 13:22
  • You are correct to use $\min(X_i),\max(X_i).$ What remains is to find the distributions of the min and the max (related to beta distributions). – BruceET Jun 15 '21 at 21:18
  • This is interesting. For the uniform 0, $\theta$ case, the max is the sufficient statistic. I think that the minimum is an ancillary statistic. Huzurbazar shows that the the unbiased estimator for the support beats the Cramer Rao lower bound (which the Rao Blackwell theorem never applied since the support depends on $\theta$). I'll edit my answer to show only the max matters. – AdamO Jun 15 '21 at 22:09
  • 1
    Use $(X_{(1)}+X_{(n)})/2-\theta$ is a pivotal quantity that you can use to construct the interval. See https://stats.stackexchange.com/questions/352854/confidence-interval-for-mean-of-a-uniform-distribution/352879#352879 for a similar approach. – Jarle Tufto Jun 16 '21 at 18:05

3 Answers3

5

I too will avoid analytic derivations of distributions of $\max(X_i)$ and $\min(X_i)$ because I guess that is the main point of this assignment. [Also see this page.]

However, results from a simulation for the case $n = 20, \theta = 10,$ based on the sample midrange, are shown below.

set.seed(2021)
n=20
mr = replicate(10^6, mean(range(runif(n, 9.5, 10.5))))
CI = quantile(mr, c(.025,.975)); CI
     2.5%     97.5% 
 9.930509 10.069571 

hdr = "Simulated Distributions of Midrange"
hist(mr, prob=T, br=50, col="skyblue2", main=hdr)
 abline(v = CI, col="red", lwd=2, lty="dashed")

enter image description here

BruceET
  • 47,896
  • 2
  • 28
  • 76
4

Definitely not the most efficient approach (doesn't condition on the sufficient statistic), but the mean is unbiased for $\theta$. And the variance of a single observation from the above distribution is 1/12. So, by the CLT:

$$ \sqrt{n} \left( \bar{X} - \theta \right) \rightarrow_d \mathcal{N}\left(0, 1/12 \right)$$

Which means that a $1-\alpha$ confidence interval for $\theta$ can be given by

$$\left(\bar{X}+(12n)^{-.5}\mathcal{Z}_{\alpha/2}, \bar{X}+(12n)^{-.5} \mathcal{Z}_{1-\alpha/2} \right)$$

AdamO
  • 52,330
  • 5
  • 104
  • 209
  • 1
    Right about 'not the most efficient'. For $n = 5, 20$ your CIs are of widths $0.506, 0.253,$ respectively; while CIs based on max and min have average widths $0.451, 0.139,$ respectively. – BruceET Jun 15 '21 at 21:17
  • @BruceET well using the min and max would be biased. Otherwise, every instance of the German Tank problem would claim you have the last tank that was made. – AdamO Jun 15 '21 at 22:03
  • 1
    Not advocating use of max or min, but of midrange (their average). So if a sample of size 12 has midrange 9.95, the 95% CI would be approx (9.89, 10.02) of length 0.14. – BruceET Jun 15 '21 at 23:08
  • thanks that was so helpfull – Hug ma Jun 16 '21 at 00:25
4

Frequentist interval via pivotal quantity: An improvement over the interval suggested by @AdamO but still suboptimal solution can be obtained using almost exactly the same method as the one I give here so I omit the details of the following derivation. The pdf of $$ Z=\frac{X_{(1)}+X_{(n)}}2-\theta, $$ is $$ f(z)=n(1-2|z|)^{n-1} $$ for $-1/2 \le z \le 1/2$. Since the distribution of $Z$ doesn't depend on $\theta$, $Z$ is a pivotal quantity.

This pdf is symmetric and the upper $\alpha/2$-quantile of $Z$ is $\frac{1-\alpha^{1/n}}2$. Thus $$ P\left(-\frac{1-\alpha^{1/n}}2<\frac{X_{(1)}+X_{(n)}}2-\theta<\frac{1-\alpha^{1/n}}2\right)=1-\alpha. $$ Inverting the double inequality, we find that $$ \frac{X_{(1)}+X_{(n)}}2 \pm \frac{1-\alpha^{1/n}}2 \tag{1} $$
is a $1-\alpha$ confidence interval for $\theta$. The midrange $(X_{(1)}+X_{(n)})/2$ is not a sufficient statistics for $\theta$, however.

Whittinghill and Hogg: As pointed out by @COOLSerdash, by inverting a likelihood ratio statistic these authors derive the interval $$\left(x_{(n)}-\frac{(1-\alpha)^{1/n}}2,x_{(1)}+\frac{(1-\alpha)^{1/n}}2\right)\tag{2}$$ which is a function of the sufficient statistic $(X_{(1)},X_{(2)})$ for $\theta$. However, simulations (see below) suggest that this interval is also suboptimal.

A Bayesian credible interval: An alternative is to represent our prior ignorance about $\theta$ by a uniform improper prior $\pi(\theta)=1$. The posterior density of $\theta$ is then $$ \pi(\theta|\mathbf{x})\propto \prod_{i=1}^n I_{(\theta-\frac12,\theta+\frac12)}(x_i)=I_{(x_{(n)}-\frac12,x_{(1)}+\frac12)}(\theta), $$ that is, conditional on the observations, $\theta$ is uniformly distributed on the interval from $(x_{(n)}-\frac12,x_{(1)}+\frac12)$. A $1-\alpha$ credible interval for $\theta$ is therefore $$ \left(x_{(n)}-\frac12 + \frac{\alpha}2L, x_{(1)}+\frac12 - \frac{\alpha}2L\right) \tag{3} $$ where $L=1-(x_{(n)}-x_{(1)})$. Interestingly, judged by frequentist criteria, based on the following simulation, this interval appear to have the exact nominal coverage but is considerably shorter on average than both (1) and (2):

ci.normal <- function(x, alpha) {
  n <- length(x)
  mean(x) + c(-1,1)*(12*n)^(-.5)*qnorm(alpha/2, lower.tail = FALSE)
}
ci.pivot <- function(x, alpha=.05) {
  n <- length(x)
  (min(x)+max(x))/2 + c(-1,1)*(1 - alpha^(1/n))/2
}
ci.wh <- function(x, alpha) {
  n <- length(x)
  c <- (1-alpha)^(1/n)/2
  c(max(x)-c, min(x)+c)
}
ci.bayes <- function(x, alpha=.05) {
  L <- 1 - (max(x)-min(x))  
  c(max(x) - .5 + L*alpha/2, min(x) + .5 - L*alpha/2)
}
coverage <- function(fn, theta=0, nsim=100000, n, alpha=0.05) {
  hits <- 0
  ci.lengths <- numeric(nsim)
  for (i in 1:nsim) {
    x <- runif(n, theta-.5, theta+.5) 
    ci <- fn(x,alpha)
    ci.lengths[i] <- ci[2] - ci[1]
    if (ci[1] < theta & ci[2] > theta)
      hits <- hits + 1
  }
  list(coverage = hits/nsim, meanlength = mean(ci.lengths))
}
> coverage(ci.normal, n=5)
$coverage
[1] 0.95315

$meanlength
[1] 0.5060605

> coverage(ci.pivot, n=5)
$coverage
[1] 0.95004

$meanlength
[1] 0.4507197

> coverage(ci.wh, n=5)
$coverage
[1] 0.94968

$meanlength
[1] 0.3226174

> coverage(ci.bayes, n=5)
$coverage
[1] 0.94991

$meanlength
[1] 0.3169024
Jarle Tufto
  • 7,989
  • 1
  • 20
  • 36