Why aren't these sentences about confidence intervals equivalent?

Question

Yes, I'm aware that there's similar/duplicated questions already open:

Aren't these statements about confidence intervals equivalent?
Why does a 95% Confidence Interval (CI) not imply a 95% chance of containing the mean?
Are there any examples where Bayesian credible intervals are obviously inferior to frequentist confidence intervals
When do (and don't) confidence intervals and credible intervals coincide?

But funny enough, NONE of these questions has any answer marked as the right one, so this question is, technically, still unsolved in this site. I have read the answers and other related questions but I don't exactly see the difference yet so I will try to explain why I think these are equivalent expressions.

Given a confidence interval $CI_D$ for some unknown fixed parameter $\theta$ calculated from a dataset $D$, why aren't these three sentences equivalent?

If I repeat the experiment an infinite amount of times and I get an infinite amount of calculated confidance intervals, 95% of them will contain $\theta$.
There's a 95% probability that $CI_D$ contains $\theta$.
There's a 95% probability that $\theta$ is in $CI_D$.

Why I think these sentences are equivalent?

Sentences 1 and 2 are equivalent for me because I don't know if $CI_D$ is one of the 95% of intervals that contains $\theta$, which is the same as saying that $CI_D$ has a probability of 95% of being one of intervals that contains $\theta$, which is the same as saying that there's a probability of 95% that $CI_D$ contains $\theta$.
Senteces 2 and 3 are equivalent because "$CI_D$ contains $\theta$" is equivalent to say "$\theta$ is in $CI_D$", because both sentences are translated as $\theta\in CI_D$. So $P[CI_D\ contains\ \theta] = 0.95$ is the same as saying $P[\theta\ is\ in\ CI_D] = 0.95$, because they are both the same as saying $P[\theta\in CI_D]=0.95$.

Am I right?

I know that frequentists doesn't allow to say "probability of fact around $X$" when $X$ is not a random variable (and $\theta$ is not because it's a constant), but $CI_D$ is and 2 and 3 speak about the probability of the relationship between $\theta$ and $CI_D$. So I'm not fully convinced that "probability of $\theta\in CI_D$" goes against the fact that $\theta$ is not a random variable, because $CI_D$ is a random variable and is also present in the same sentence (it tells something about $\theta$, but it also tells something about the $CI_D$).

Bear in mind that the "accepted" answer is just the opinion of the initial questioner; there are many cases on this site where someone asks a question but then never accepts an answer (despite receiving many good ones). — Ben, Feb 12 '22 at 20:29
Your #1 is false for the reasons given in my answer. If the method does not give nominal coverage over the whole parameter space then #1 cannot be true for any number of repeats of the experiment because the fixed $\theta$ will not be spread over the whole parameter space. — Michael Lew, Feb 12 '22 at 20:40
Opinions differ. Many frequentists would agree with your #1. I think many frequentists would object to 'probability' instead of 'confidence' in #2 & #3. // Bayesians would speak of a 'credible' interval or 'posterior probability interval'. // If someone strongly disagrees with an informative prior distribution and there is not much data, then they may believe the resulting Bayesian interval estimate is inferior to a frequentist CI. Some Bayesian interval estimates with non-informative priors are used as frequentist CIs. One example is the Jeffreys CI for binomial proportion. — BruceET, Feb 12 '22 at 20:42
@BruceET It may be true that many frequentists would agree with the statement, but surely they would be mistaken. Is my explanation insufficient to demonstrate that? — Michael Lew, Feb 12 '22 at 20:53
(1) makes no sense, because of its invalid use of infinities, but its intention to interpret probability as a limiting relative frequency is clear, and if we take "95%" to mean "*exactly* 95%" then (1), (2), and (3) are logically equivalent. But *none* of them is a necessarily correct characterization of a confidence interval, for the simple reason that there exist many applications in which this common probability does *not* equal the nominal 95% value. — whuber, Feb 12 '22 at 21:06
"But funny enough, NONE of these questions has any answer marked as the right one" -- the accepted answer is the one the asker thought was the best. That does not in any sense imply it is correct; the asker is usually very poorly placed to judge that. Upvotes are slightly better, but a question that's seen a lot of "outside" traffic (has hit the hot questions queue for example) can also end up with very wrong answers getting the most upvotes; later, correct answers may get little recognition. ... ctd — Glen_b, Feb 13 '22 at 05:53
ctd ... You should rely much more on the quality of the *explanations* of why the answer is correct or why other answers are not. Comments often give very valuable pointers as well. You should not view OP's acceptance as dispositive and only defer to a limited extent to upvotes. User reputation can sometimes be somewhat of a guide, but (i) it, too is an imperfect instrument (high upvotes per answer for someone with a lot of answers is better) and (ii) we are all of us imperfect, so even a good answerer can make a mistake. — Glen_b, Feb 13 '22 at 05:54
Even if the lack of acceptance indicated a question did not have a correct answer, a second question asking the same thing is still a duplicate and should be closed as such. Are you asking here the same thing as any of those questions, or are you not? — Nij, Feb 13 '22 at 08:35

Ben · Accepted Answer · 2022-02-12T22:39:19.570

The events in statements 2 and 3 are obviously equivalent --- I interpret them as $CI_D \ni \theta$ and $\theta \in CI_D$ respectively. The issue here is that you are vague about whether you are talking about CIs as random intervals or as fixed intervals after the observed data has been substituted, and you are also vague about whether you are talking about conditional or unconditional probability. Below I will show which mathematical statements about confidence intervals are true/false. So long as you describe these statements correctly in a textual sense (which requires more explicit specification of some issues you're glossing over) you should be fine.

Probabilistic properties of the CI: I'll conduct a purely probabilistic analysis of confidence intervals as mathematical objects, so I'll examine probability statements applying to these objects that are both conditional and unconditional on $\theta$. Note that in the classical framework, the parameter is treated as an "unknown constant" so we (implicitly) condition on it in all probability statements in that context. Nevertheless, I'll look at things more broadly so that you can see what probabilistic statements are true/false within a generalised framework where you examine the CI on a purely mathematical basis.

In order to show you what statements about confidence intervals are true/false, we will use more detailed notation. Let $\text{CI}_\theta(\mathbf{X}, \alpha)$ denote the $1-\alpha$ level confidence interval for $\theta \in \Theta$ using (random) data vector $\mathbf{X}$. This object is a mapping $\text{CI}_\theta: \mathbb{R}^n \times [0,1] \rightarrow \mathfrak{p}(\mathbb{R})$ that maps an input data vector and significance value to a measureable subset of the real numbers. (For a confidence interval the output of the function is a single connected interval, but you can generalise to use confidence sets if you want to remove this restriction.)As I've n oted in several other answers (some for questions you link to), an exact confidence interval is defined by the following property:

$$\mathbb{P}(\theta \in \text{CI}_\theta(\mathbf{X}, \alpha) | \theta) = 1-\alpha \quad \quad \quad \quad \text{for all } \theta \in \Theta.$$

(An approximate confidence interval is one where there is approximate equality, usually relying on asymptotic distributional results.) Substituting the observed data $\mathbf{X}=\mathbf{x}$ then gives the (fixed) confidence interval $\text{CI}_\theta(\mathbf{x}, \alpha)$. To allow us to assess statements about "repeated experiments" we will let $\mathbf{X}_1, \mathbf{X}_2, \mathbf{X}_3, ...$ denote a sequence of IID random vectors with distribution equivalent to the random vector $\mathbf{X}$.

So, assuming you are using an exact confidence interval, the following statements are true/false$^\dagger$:

$$\begin{align} \mathbb{P}(\theta \in \text{CI}_\theta(\mathbf{X}, \alpha) | \theta) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{True} \\[12pt] \mathbb{P}(\text{CI}_\theta(\mathbf{X}, \alpha) \ni \theta | \theta) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{True} \\[12pt] \mathbb{P}(\theta \in \text{CI}_\theta(\mathbf{X}, \alpha)) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{True} \\[12pt] \mathbb{P}(\text{CI}_\theta(\mathbf{X}, \alpha) \ni \theta) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{True} \\[12pt] -------------&---------------- \\[6pt] \mathbb{P}(\theta \in \text{CI}_\theta(\mathbf{x}, \alpha) | \theta) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{False}^\dagger \\[12pt] \mathbb{P}(\text{CI}_\theta(\mathbf{x}, \alpha) \ni \theta | \theta) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{False}^\dagger \\[12pt] \mathbb{P}(\theta \in \text{CI}_\theta(\mathbf{x}, \alpha)) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{False}^\dagger \\[12pt] \mathbb{P}(\text{CI}_\theta(\mathbf{x}, \alpha) \ni \theta) &= 1-\alpha \quad \quad \quad \quad \quad \quad \quad \quad \text{False}^\dagger \\[12pt] -------------&---------------- \\[6pt] \mathbb{P} \bigg( \lim_{k \rightarrow \infty} \frac{1}{k} \sum_{i=1}^k \mathbb{I}(\theta \in \text{CI}_\theta(\mathbf{X}_i, \alpha)) &= 1-\alpha \bigg| \theta \bigg) = 1 \quad \quad \quad \quad \ \ \text{True} \\[6pt] \mathbb{P} \bigg( \lim_{k \rightarrow \infty} \frac{1}{k} \sum_{i=1}^k \mathbb{I}(\theta \in \text{CI}_\theta(\mathbf{X}_i, \alpha)) &= 1-\alpha \bigg) = 1 \quad \quad \quad \quad \quad \ \text{True} \\[6pt] \end{align}$$

If you are working in the classical ("frequentist") context, you can ignore the marginal probability statements here and focus entirely on the conditional probability statements. (In that context the parameter is an "unknown constant" and so all our probabilistic analysis implicitly conditions on it having a fixed value.) As you can see, the remaining distinction that determines whether the statement is true/false is whether you are talking about the "data" in its random sense or fixed sense. You also need to take care to state these mathematical conditions clearly and accurately.

$^\dagger$ Statements listed as $\text{False}$ are statements that are not true in general. These statements may be true "coincidentally" for some specific values of the inputs.

Your notation is puzzling: could you explain what distinction "$\mid\theta$" is making? I do not expect you to reply "conditional probability," because that is inapplicable: $\theta$ is a parameter, not a random variable, and therefore there is neither a joint distribution of $(\theta,X)$ nor any set of conditional distributions one can invoke. — whuber, Feb 12 '22 at 21:31
@whuber: My intention is indeed to distinguish conditional/marginal probability statements. This is a purely probabilistic analysis, so there is no reason you can't examine CIs in a context where you assume a joint distribution for $(\theta, \mathbf{X})$ (i.e., treat the parameter as a random variable). This approach encompasses a Bayesian analysis of what happens when using classical confidence intervals, and it also encompasses the conditional case which gives the classical analysis of what happens when using classical confidence intervals. ... — Ben, Feb 12 '22 at 21:34
In the latter case, where the parameter is an "unknown constant", all probability statements are implicitly conditional on the parameter and so only the conditional probability statements are at issue. — Ben, Feb 12 '22 at 21:35
At the very least, those distinctions need to be included in the answer. But going off on a Bayesian tangent doesn't look helpful in this context. — whuber, Feb 12 '22 at 21:36
@whuber: Okay, I've edited to make this clear. I disagree that this is an issue of being Bayesian or not. Questions about the probabilistic properties of CIs (or any other statistical object) are purely probabilistic questions. While the object itself is formed based on a statistical theory, in my opinion it is appropriate to place these in a purely probabilistic setting without concern for whether particular statistical approaches treat things as random variables or constants. The more generalised probabilistic context is more helpful because it covers all cases of possible interest. — Ben, Feb 12 '22 at 21:54
What troubles me is that the "Frequentists" who are often parodied here on CV are perfectly happy to perform Bayesian analyses in circumstances where $(\theta,X)$ can be modeled as a random variable and suitable prior information for $\theta$ exists. The classical CI solution to the "inverse probability" problem *explicitly* denies the randomness of $\theta.$ To be fair to this solution, we shouldn't introduce assumptions it rejects. — whuber, Feb 12 '22 at 22:17
Once I have calculated an specific confidence interval $CI$, what is the probability that such specific calculated $CI$ contains the unknown parameter $\theta$? — Peregring-lk, Feb 12 '22 at 22:28
@whuber: I've just corrected an error in the answer, which might change your mind. As I've pointed out in other answers, the conditional result (which is the frequentist one) is actually stronger than the marginal, and the marginal is also true for any distribution on $\theta$. (Now corrected in the answer.) So I think the fairness issue is probably off the table now. — Ben, Feb 12 '22 at 22:40
Ok, $\mathbb{P}(\theta \in \text{CI}_\theta(\mathbf{x}, \alpha)) = 1-\alpha$ it's an incorrect statement, but twhat is the value of $\mathbb{P}(\theta \in \text{CI}_\theta(\mathbf{x}, \alpha))$ then? Ok, I know that in this statement there's not a single random variable. All terms are specific terms, but for example, in the Monty Hall problem, each door contains a goat, or does not contain a goat, and I can still use probabilities to reason about my best move. So statistics are still an useful tool to work with ununcertainties, why can't I with confidence intervals? — Peregring-lk, Feb 13 '22 at 08:58
@Peregring-Ik: Ah, yes. Since $\mathbf{x}$ is fixed this is essentially a posterior probability. It's value will depend on the distribution of $\theta$. In the frequentist case where $\theta$ is an unknown constant it is also fixed, so the probability will be either zero or one (depending on whether the deterministic event $\theta \in \text{CI}_\theta(\mathbf{x}, \alpha)$ is true or false). — Ben, Feb 13 '22 at 09:03
@Ben Ok, I'm starting to get it but, how will a frequentist deal with the Monty Hall problem? Or will they just say: the goats are already there, so the probability is either 0 or 1, and I've no tools to decide which is the best move because using statistics here makes no sense because everything is already in place without any random variable involved; so I don't like this game, I always do a bad move and I'm a looser. Frequentists must have some way to deal with this kind of problems right? — Peregring-lk, Feb 13 '22 at 09:27

score 3 · Answer 2 · answered Feb 12 '22 at 20:31

Sentence 1 is the de jure interpretation of confidence intervals. I like to say it as follows:

The 95% in "95% confidence interval" refers to the long term relative frequency of these estimators containing the true estimand upon repeated construction under ideal and identical circumstances.

Before embarking, I'd like to point out that $ \theta \in CI_D$ can be interpreted as either $\theta$ being in $CI_D$ as in sentence 3, or $CI_D$ covers $\theta$ as in sentence 2. Hence, sentences 2 and 3 are (from where I stand) equivalent. Let's discuss why sentence 1 and 2 (or 1 and 3) are thus not equivalent.

The statement made in sentence 2 seems to imply that $\theta$ is the random quantity. Frequentists treat estimands as fixed. Hence sentences 1 and 2 are not equivalent merely based on this fact. Now, people will often defend interpreting confidence intervals in the style of sentence 2 (or 3) by referencing the estimators long term relative frequency, however I think this is an error. Any given confidence interval either contains the estimand (assuming it is fixed) or it does not. There is no probabalistic element to this, unless we were to repeat the construction of the interval with different data (which would appeal to the definition). Its like asking "what is the probability that the card on the top of this deck is an ace?". The card is either an ace or it isn't, there is no randomness here. When people answer "the probability is 4/52" what they really mean is "In an infinite sequence, were I to come to well shuffled decks then 4/52 decks would have an ace on top". These are two very different scenarios.

That's my argument, let's take a look at some of yours:

Sentences 1 and 2 are equivalent for me because I don't know if CIDCI_D is one of the 95% of intervals that contains θ\theta, which is the same as saying that CIDCI_D has a probability of 95% of being one of intervals that contains θ\theta, which is the same as saying that there's a probability of 95% that CIDCI_D contains θ.

Here, I think you are making the mistake I spoke of earlier using the decks. The interval is fixed, the parameter is fixed, the interval either covers it or it does not. This means the probability is either 0 or 1, but we don't know. All we know is the long term relative frequency of intervals capturing the estimated under ideal conditions, and we just hope that this happens to be one of those times where our interval covers the estimand.

Senteces 2 and 3 are equivalent because "$CI_D$ contains $\theta$" is equivalent to say "$\theta$ is in $CI_D$", because both sentences are translated as $\theta\in CI_D$. So $P[CI_D\ contains\ \theta] = 0.95$ is the same as saying $P[\theta\ is\ in\ CI_D] = 0.95$, because they are both the same as saying $P[\theta\in CI_D]=0.95$.

I disagree with the statement $P[CI_D\ contains\ \theta] = 0.95$, because as noted before, any given interval either covers the estimand or not. The probabalistic statement is about infinite sequences of intervals.

If you're interested, I wrote a little about this here on my blog.

At the outset, you assume the coverage probability and the nominal size of the confidence interval are the same. *Usually they are not.* The commonest exceptions are for composite null hypotheses (e.g., one-sided tests of means) and discrete statistical distributions (e.g., Binomial CIs). — whuber, Feb 12 '22 at 21:07
"The interval is fixed, the parameter is fixed, the interval either covers it or it does not." Ok, but if I don't know whether it covers it or not... there is nothing I can say about its uncertainty? — Peregring-lk, Feb 12 '22 at 21:52
@Peregring-lk You can't say if interval covers the estimand or not (else we wouldn't need inference). The only thing you know is the coverage (and even then there are some doubts, as whuber mentions). You hope that *this* interval is one of the intervals that covers the estimand (and ostensibly there is good reason to believe this assuming things are run without any bias, etc) and act accordingly. — Demetri Pananos, Feb 12 '22 at 21:59
@DemetriPananos I "hope", can't I quantify the degree of hope? If the interval covers it or not and I don't know with what likehood either event can happen, calculating a confidence interval is completely worthless because I have no guarantee of anything, right? — Peregring-lk, Feb 12 '22 at 22:22
@DemetriPananos it's funny that I have more precise information about the method BEFORE I have calculated any specific interval (95% of likehood of giving an interval covering the parameter), than AFTER I have calculated an specific interval (it might contain the parameter, it might not, and you can't even know with what probability). Calculating an specific interval from an specific dataset has destroyed statistical information. What? — Peregring-lk, Feb 12 '22 at 22:26
The confidence level is *exactly* the thing that quantifies the degree of hope. It might help to explore the other threads about this topic a little more deeply. — whuber, Feb 12 '22 at 22:47
This discussion reminds me about [my answer to another question](https://puzzling.stackexchange.com/a/114196/1649), where it depends whether you assume that some underlying fixed parameter is included in the probability space or not. I hope my answer there can help (there, if $p$ is assumed fixed, we don't know the probability of the final result, but if we put a prior to $p$, then we can). Also note that probability depends on the probability space, i.e., the process that describes the system. The chord on a circle is a famous example. — justhalf, Feb 13 '22 at 05:51

score 2 · Answer 3 · answered Feb 12 '22 at 20:36

The key thing to know is that the frequentist confidence interval say nothing directly about the unknown fixed value of $\theta$, the local parameter of interest for the experiment that you ran. Instead, the frequentist confidence interval is the result of a method that yields intervals with on average the nominal coverage (with a few small caveats) when applied in the long run to analyses of experiments for all possible values of $\theta$.

Some types of frequentist confidence interval that yield exactly the nominal coverage for all possible values of the parameter of interest (e.g. the ordinary Student's t confidence interval for the random sample-based estimate of the mean of a normally distributed population). If your interval is of that type then you can indirectly infer something about the probability of your fixed unknown instance of the parameter falling within the interval. But that probability is not a 'proper' frequentist probability because that 'proper' probability is either one or zero. Of course, even that indirect inference is only well-formed if you know for certain that your data are obtained in circumstances that allow the distributional and sampling assumptions of the method to be correct.

Other types of frequentist interval do not have uniformly nominal coverage. Cases where the data are discrete provide useful examples. See here, for example: Discrete functions: Confidence interval coverage? If your fixed unknown value of $\theta$ lies in an 'unlucky' region of parameter space then the coverage of the method can be remarkably far from nominal.

Where the coverage is not exactly nominal over all possible values of the parameter you cannot assume that the coverage is nominal for your fixed unknown value of the parameter and so you cannot make any exact statement about the probability that you value lies inside the interval.

If you need to know the probability that an interval contains the unknown value of $\theta$ then you need to use a method that allows direct specification of probabilities of non-random parameters. In other words you need Bayesian probabilities.

It is true that very often the Baysian credible intervals are very similar to (or identical to) some frequentist intervals (as long as the priors allow) and so Bayesian statements about the presence of fixed parameter values within frequentist intervals are not too misleading even if they are incorrect.

This is a standard straw man argument about confidence intervals. I am unaware of any *good* classical textbook that would assert the probabilities it is analyzing are "either one or zero." That sounds like an (invalid) Bayesian re-interpretation of the situation. An otherwise excellent collection of observations is flawed by this circularity. — whuber, Feb 12 '22 at 21:09
@Whuber Seems to me that you made several of the same points as me in your answer here: https://stats.stackexchange.com/questions/11856/how-to-interpret-confidence-interval-of-the-difference-in-means-in-one-sample-t/12085#12085 I don't see how your quibble about my suggestion that a 'proper' frequentist probability regarding coverage of a particular fixed value is relevant to the rest of my answer. — Michael Lew, Feb 13 '22 at 02:37

Why aren't these sentences about confidence intervals equivalent?

3 Answers3