2

Consider a data generating process with a parameter of interest $\theta$. I would like to estimate $\theta$ as precisely as possible and also quantify the estimation imprecision / uncertainty. I obtain an $n$-size i.i.d. sample from which a confidence interval (CI) for parameter $\theta$ is estimated. The estimator of the CI guarantees that in the long run, 95% of such CIs will include the true parameter value. However, I do not know whether the estimate I got from this particular sample includes the true parameter value or not; nor can I make a statement like the probability that $\theta$ lies in this concrete interval is 95%. Not quite a satisfactory answer for me. I would like to say more about the estimated confidence interval (the particular realization of the respective estimator).

But is this all that the frequentist paradigm has to offer on this question? For example, couldn't I say that I am 95% confident that this particular interval includes the true value, based on the long-run properties of the estimator? (For that matter, I would bet 95 against 5 that the interval actually includes the true value and consider this to to be a fair bet.) Is there a "legitimate" way to express this confidence within the frequentist paradigm, and if so, how should I phrase it?

P.S. Yes, I am aware of Bayesian statistics and the corresponding problem and answer formulations. Nevertheless, this question is intentionally about the frequentist paradigm.

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
  • 2
    "For example, couldn't I say that I am 95% confident that this particular interval includes the true value" -- yes, for that particular meaning of *confident*. – Glen_b Aug 21 '18 at 13:02
  • @Glen_b, thank you! Is the particular meaning even defined in my post? I mean, intuitively it is kind of obvious (at least for me), but then how should I formulate it in a clean way (not to clash with the frequentist perspective)? – Richard Hardy Aug 21 '18 at 13:10
  • 1
    Yes, it is. "[...] in the long run, 95% of such CIs will include the true parameter value." -- it's in that specific sense that you can talk about *confidence* – Glen_b Aug 21 '18 at 13:12
  • What you actually want is a credible interval, a probabilistic model of what you should believe. First: Many credible intervals are very similar to confidence intervalls. Second: If you want to bet, you should really include prior knowledge before you place your bet. And that leads to Bayes statistics. Why do you want to stay with Frequentism, whilst the answers you seek are the ones that Bayes gives? – Bernhard Aug 21 '18 at 13:21
  • 1
    @Bernhard, who said I want to stay with Frequentism? :) But that is another topic. Yes, I am aware of the Bayesian solution, but I cannot change the curriculum I am going to teach to my students, so I am trying to squeeze out the most I can from the frequentist paradigm. – Richard Hardy Aug 21 '18 at 13:24
  • 1
    Sorry, I thought staying with Frequentism was meant by ' [...] to express this confidence within the frequentist paradigm'. I probably got that wrong. Maybe you want to introduce your students to JASP software, where Frequentist and Baysian solution buttons are next to each other and then they can compare confidence intervalls with reasonable credible intervals and then they get a feeling on how similar the values often are whilst learning, that they are something different. – Bernhard Aug 21 '18 at 13:32
  • 1
    @Bernhard, thank you. I will include it in my list of ideas for future changes in the curriculum. Pleased to learn about JASP, too. Regarding staying with frequentism, I did not mean this as a personal preference but rather as a delimitation of the question. – Richard Hardy Aug 21 '18 at 13:55
  • RichardHardy: I remember whuber once described a spinner that returns the real line with 95% probability & the empty set with 5%. It can therefore be used to produce valid confidence intervals for real-valued parameters without giving you the trouble & expense of collecting data. Would you still offer odds of 19:1 against the parameter value's falling outside the confidence interval *after* you'd seen the result of the spin? – Scortchi - Reinstate Monica Aug 21 '18 at 18:24
  • @Scortchi, hmm, how should this example help me? The result of the spin tells me with 100% certainty whether the value is covered, while the estimated CI does not. Seeing the estimated CI does not change my confidence while seeing the result of the spin does. – Richard Hardy Aug 21 '18 at 19:26
  • What estimated CI are you talking about? The one referred to in your question? – Scortchi - Reinstate Monica Aug 21 '18 at 19:36
  • @Scortchi, yes, that one (not the one generated by the spinner). – Richard Hardy Aug 21 '18 at 20:03
  • So it's not merely because a procedure gives intervals with 95% coverage that you'd be prepared to offer those odds. I suggest that if you consider the procedures you do use, they give intervals roughly comparable to credible intervals from vague priors. – Scortchi - Reinstate Monica Aug 21 '18 at 20:24
  • @Scortchi, I agree. And of course, for practical reasons I am interested in CIs that are not the entire real line but as narrow as possible without violating the 95% (or whatever specified) coverage, i.e. the ones typically considered in textbooks (at least this is my perception). – Richard Hardy Aug 21 '18 at 20:29
  • Ha! Found it: https://stats.stackexchange.com/q/66407/17230. Here a CI open to similar objections is being seriously proposed. It might be the narrowest possible that maintains coverage for all I know. – Scortchi - Reinstate Monica Aug 21 '18 at 20:44
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/82039/discussion-between-scortchi-and-richard-hardy). – Scortchi - Reinstate Monica Aug 21 '18 at 20:48

2 Answers2

2

The important thing about frequentist statistics is that parameters are not random variables. Rather, they are fixed quantities which happen to be unknown. It's therefore not legitimate to make probability statements about parameters. For example, the statement "the probability that the parameter is between 500 and 507 is 95%" is not a valid statement for a frequentist.

However. That does not rule out any statement about randomness which involves the parameter in some way. To be specific: there could be other sources of randomness in the statement. In my view some authors are too hawkish about banning this (it's much easier to do that than explain the nuance of the issue to students); but frequentists can actually make probability statements about confidence intervals, as long as they are made properly and very carefully.

To wit: you have to be absolutely clear that by "confidence interval", you are talking about the estimator (as you made clear in your question). Estimators are statistics, which means they are functions of the data, where the data comes from a random sample. They are, therefore, random variables.

The statement "the parameter is in the confidence interval" is therefore a random event in the technical sense that it describes a subset of the sample space (i.e. a subset of the possible outcomes of random sampling). Sometimes this event happens; sometimes it does not. And therefore, even for a frequentist, this statement can be assigned a probability; that is, you can say "the probability that the parameter is in the confidence interval is 95%".

Though it's a little more informative to phrase this as the following equivalent statement: "the probability that the confidence interval contains the parameter is 95%". This just makes it linguistically more clear that the confidence interval is the thing "doing" the randomness here, to which a probability is being assigned.

The danger in this statement is that you calculate a specific confidence interval for a specific sample (and get [500, 507] for example), and your reference to "confidence interval" is taken to refer to this interval. But [500, 507] is not a statistic: it is the realisation of a statistic. It's just a pair of numbers. It's not random. And "the probability that the confidence interval contains the parameter is 95%" is not, under that interpretation, a valid statement.

To summarise: yes, a frequentist actually can make statements about the probability of a parameter falling within a confidence interval, as long as it is absolutely clear that by "confidence interval" we are talking about an estimator / a statistic, not a specific estimate / the realisation of a statistic.

Denziloe
  • 631
  • 5
  • 14
  • Thank you! I think I follow you. My question takes this further by saying that a statement about an estimator is not quite satisfactory (often not of direct interest), instead I care about an estimate (the concrete realization) and how confident I can be that it contains the true value. – Richard Hardy Aug 21 '18 at 20:11
  • 1
    I think if you think about it you'll find that what you're talking about when you make this statement really is a statement about the estimator and procedure of taking a random sample. Anything else doesn't actually make sense when you try to formalise it. Remember that "the concrete realisation" is just a pair of numbers. For example, say your confidence interval is [500, 507]. What would a probability statement about [500, 507] actually mean? There's nothing random there, they are just numbers. Only in the context of random sampling can you imbue them with the concept of randomness. – Denziloe Aug 21 '18 at 20:20
  • I agree with you. However, I feel there is more we can say about the realization, such as my statement with *confidence* in it, and I would like to formulate it nicely. After all, we are often interested in whether the particular realization of CI contains the true parameter value or not and with what level of confidence we could say that. Just stating the probabilistic properties of an estimator is not quite satisfactory in many cases. – Richard Hardy Aug 21 '18 at 20:26
  • "Confidence" is just an English word. As far as I know there is no mathematical definition for it. There is therefore no way to "formulate" your statement nicely in mathematical terms, because it isn't a mathematical statement to start with. The frequentist concept of confidence intervals *is* the frequentist attempt to capture the vague concept of "confidence" in concrete mathematical terms, and all that's to say about them has already been said. I'm not sure what you're looking for. – Denziloe Aug 21 '18 at 20:32
  • I see your point. My problem is, the traditional frequentist statements do not express the information we have about the particular estimate based on the information on the estimator, the latter information being explicit. In my opinion, not saying anything about an estimate is just throwing away information. I have already formulated this information in informal terms in my post, and I am looking for ways of expressing it in a formally correct way. I hope there is one in the frequentist paradigm. – Richard Hardy Aug 22 '18 at 06:19
  • I think the answer to what you're driving at is simply "no". With respect to a specific confidence interval for a parameter, a frequentist can't make any statements about confidence because confidence is a concept related to uncertainty and probabilities, but for a frequentist neither the specific confidence interval nor the parameter are random and no probabilistic statements can be made about them. This "limitation" is exactly why some people opt for a Bayesian approach. – Denziloe Aug 22 '18 at 09:28
1

But is this all that the frequentist paradigm has to offer on this question?

Yes.

Your two statements:

  1. The estimator of the CI guarantees that in the long run, 95% of such CIs will include the true parameter value.

  2. I am 95% confident that this particular interval includes the true value, based on the long-run properties of the estimator?

Are exactly equivalent, because the "confident that" of statement 2 is synonymous with the long run probabilities of statement 1, even without the dependent clause of statement 2. It is what it is. That said, the first clause of statement 2 is in my opinion a more concise and readable plain-language expression of a CI for a technically literate audience.

When I motivate CIs for my students I stress the meh of their formal meaning. However, I also offer as an explicitly loosey-goosey alternative definition that CIs provide a plausible range of values for $\theta$. "Loosey-goosey" because that's not the precise definition, and because the meaning of confidence level is, I feel, somewhat obscured in that interpretation. But it helps somewhat.

Alexis
  • 26,219
  • 5
  • 78
  • 131