4

With most pivotal quantities that I'm thinking of, after we rearrange stuff to get an interval, the interval's "location" depends on a random point estimate. The example below has an interval with midpoint equal to the sample mean.

Is there a way to guarantee that we end up with an interval centered at $0$? By "centered at $0$" I mean an interval of the form $(-c,c)$ for some fixed $c \in \mathbb{R}^+$.

For example, with a normal random sample, we could start with the pivotal quantity $\sqrt{n}(\bar{x} - \mu)/s$, and rearrange the right hand side of

$$ 1 - \alpha = \mathbb{P}\left(-t_{\alpha/2,n-1} \le \sqrt{n}(\bar{x} - \mu)/s \le t_{\alpha/2,n-1} \right) $$

into

$$ \mathbb{P}\left(\bar{x}-\frac{s}{\sqrt{n}}t_{\alpha/2,n-1} \le \mu \le \bar{x}+\frac{s}{\sqrt{n}}t_{\alpha/2,n-1}\right). $$

Is there a way we could scoot it over $\bar{x}$ and then fix the (random) width so that we end up with $95$% coverage, still?

I suspect it isn't possible. If I'm not changing the pivotal quantity, all I can do is pick the quantiles. This example is an "equal-tailed" interval. But how can I deterministically pick quantiles, by picking two positive numbers that sum to $\alpha$, based on a random data set I haven't obtained yet?

Taylor
  • 18,278
  • 2
  • 31
  • 66
  • 2
    Confidence interval for what parameter? What distribution family? Why centered at 0? Indeed, what do you mean by this phrase? – whuber Sep 14 '21 at 16:59
  • “ the interval is centered at a point estimate. Is there a way to end up with an interval centered at 0?” You answered yourself: you just need the point estimate to be zero. – Tim Sep 14 '21 at 17:19
  • 2
    The point estimate does not have to be in the center, such as a confidence interval for variance. `set.seed(2021); N – Dave Sep 14 '21 at 19:55
  • @whuber see edits. (-c,c) for some positive c. – Taylor Sep 14 '21 at 22:21
  • @Tim I need to fix the center with probability 1. – Taylor Sep 14 '21 at 22:21
  • @Dave right see edits. The problem is I want to choose the center of the interval instead of letting it be dictated by the data randomly. – Taylor Sep 14 '21 at 22:21
  • 2
    If $[L,U]$ is a confidence interval for your parameter of size $\alpha$ and you let $c=\min(|U|,|L|)$ then $[-c,c]$ is a confidence interval, so all you need do is compute the size of this interval. This likely won't always work (the size might be much larger than $\alpha$), but it will work in the example you provide. – whuber Sep 15 '21 at 13:01
  • @whuber but won’t that construction over cover? E.g., the coverage at 0 will be 100% as 0 always included. The OP’s construction requires that the interval can be the empty set in order not to get 100% coverage at 0 – innisfree Sep 15 '21 at 15:00
  • @Innis Yes, most of the time the coverage will be huge. But that is not how the size of the interval is computed: the size is the *smallest* possible coverage. When the parameter equals $0$ and $[L.U]$ is a symmetric interval, the coverage of the $[-c,c]$ confidence interval is the same as $[L,U].$ In the Normal-mean case this coverage is exactly $1-\alpha,$ whence the coverage of the $[-c,c]$ procedure is correct. – whuber Sep 15 '21 at 15:06
  • 1
    Let me think about your comment about coverage at 0. Actually, I don’t understand your suggestion because I would use max(u, l) rather than min(u, l) – innisfree Sep 15 '21 at 15:25
  • A confidence interval just needs to cover the $(1-\alpha) \%$ interval. It can be anywhere. For good measure, though, usually people prefer the C.I. with smallest span (so it will be around the area of highest density) – Firebug Sep 16 '21 at 11:53
  • The problem is that sometimes there is not enough density on either side of 0 to achieve the $(1-\alpha)\%$ interval, so the problem is unsolvable for some distributions – Firebug Sep 16 '21 at 12:17

2 Answers2

5

I find this a strange question but one idea would be to construct the interval from a confidence distribution of the parameter of interest. In the iid normal case for $\mu$ the cdf of one such confidence distribution is $$ F(\mu) = F_{T_{n-1}}\left(\frac{\sqrt{n}(\mu-\bar x)}s\right). $$ where $F_{T_{n-1}}$ is the cdf of the student $t$-distribution with $n-1$ degrees of freedom.

Then, choosing $c$ such that $$ F(c)-F(-c)=1-\alpha, $$ we may hope to achieve coverage of the interval $(-c,c)$ close to the nominal level of $1-\alpha$. This indeed appears to happen for large $\mu/\sigma$ (black curve in plot below). However, as $\mu$ becomes small this procedure inevitably leads to an interval always containing $\mu$.

The interval $(-c,c)$ where $c=\max(|U|,|L|)$ and $(U,L)$ is the ordinary Student $t$-interval for $\mu$ (suggested by @whuber in the comments) appears to achieve a confidence level of $1-\alpha$ if the confidence level of the interval $(U,L)$ is $1-2\alpha$ (red curve in plot below). This implies that it is always shorter than the one derived via the confidence distribution. It also appears to have coverage always closer to the nominal level (red curve in plot below).

enter image description here

R code:

ci0 <- function(x, alpha=.05, upper=100) {
  n <- length(x)
  xbar <- mean(x)
  s <- sd(x)
  cdf <- function(mu) {
    pt(sqrt(n)*(mu - xbar)/s, df = n - 1)
  }
  f <- function(c) {
    cdf(c)-cdf(-c) - 1 + alpha
  }
  c <- uniroot(f, lower = 0, upper = upper)$root
  c(-c,c)
}
whuber <- function(x,alpha=.05) {
  ci <- t.test(x, conf.level=1-2*alpha)$conf.int
  c <- max(abs(ci))
  c(-c,c)
}
coverage <- function(fn, mu, sigma=1, n, nsim=1e+4, alpha=0.05) {
  hits <- 0
  for (i in 1:nsim) {
    x <- rnorm(n, mu, sigma)
    ci <- fn(x, alpha=alpha) 
    if (ci[1] < mu & ci[2] > mu) {
      hits <- hits + 1
    }
  }
  list(coverage=hits/nsim, binom.test(hits, nsim, p=1-alpha)$p.value)
}

m <- 40
mu <- seq(0, 3, length=m)
res1 <- res2 <- numeric(m)
for (i in 1:m) {
  res1[i] <- coverage(ci0, mu[i], n=10, nsim=1e+4)$coverage
  res2[i] <- coverage(whuber, mu[i], n=10, nsim=1e+4)$coverage
}
plot(mu, res1, xlab="mu/sigma", ylab="coverage", type="l")
lines(mu, res2, col="red")

abline(h=.95+sqrt(.95*.05/1e+4)*qnorm(c(.025,.975)), lty=3)
legend("topright",c("via conf.distr.","wbuber"),col=c("black","red"),lty=1)
Jarle Tufto
  • 7,989
  • 1
  • 20
  • 36
  • 1
    This is the most Bayesian frequentist thing I’ve ever seen, and I love it. Very slick – Taylor Sep 15 '21 at 22:59
  • Well, it appears that the simpler prodedure suggested by @whuber produces a shorter interval with coverage always closer to the nominal level (see my updated answer). – Jarle Tufto Sep 16 '21 at 11:34
0

I prefer visualizing frequentist inference using a confidence curve.

enter image description here

Without loss of generality we could consider a hypothesis $H_0: \theta=\theta_0$ (not necessarily zero). Above is a confidence curve for inference on a Bernoulli proportion $p$ from inverting the binomial CDF, $F_Y(y,n,p)$, based on $6$ events with a sample size of $n=10$. Considering the hypothesis $H_0:p=0.5$, for this experimental result the upper-tailed p-value is $0.38$. We are therefore $38\%$ confident the unknown fixed true $p$ is less than or equal to $0.5$ and $100(1-0.38)\%=62\%$ confident the unknown fixed true $p$ is greater than or equal to $0.5$. These confidence levels are nothing more than a restatement of the one-sided p-value testing $H_0: p= 0.5$. The vertical reference lines identify the two-sided equal-tailed $75\%$ confidence interval. This is the set of hypotheses that are most plausible given the observed data $-$ those hypotheses that when tested are not significant at the two-sided equal-tailed $25\%$ level. For each hypothesis in this interval the observed result is within a $75\%$ margin of error. Here is a related post.

For the given experimental result, by considering a margin of error that does not have equal tails you can construct a $95\%$ confidence interval that is centered at $H_0:p=0.5$.

For the same experimental result here is another confidence curve centered at $H_0: p=0.5$. The maximum likelihood estimate is still $\hat{p}=0.6$ but I have defined the peak of the curve at $p=0.5$. The $95\%$ confidence interval centered at $0.5$ is $(0.15, 0.85)$ The upper-tailed p-value testing $H_0: p=0.15$ is $0.001$ and the lower-tailed p-value testing $H_0: p=0.85$ is $0.049$. We are therefore $95\%$ confident the unknown fixed true $p$ is within $(0.15, 0.85)$.

enter image description here

Geoffrey Johnson
  • 2,460
  • 3
  • 12