Questions tagged [confidence-interval]

A confidence interval is an interval that covers an unknown parameter with $(1-\alpha)\%$ confidence. Confidence intervals are a frequentist concept. They are often confused with credible intervals which is the Bayesian analog.

A confidence interval is an interval that covers an unknown parameter of interest (e.g., the mean) with $(1-\alpha)\%$ confidence. Confidence intervals are a frequentist concept. A credible interval is a related concept in Bayesian statistics. People often incorrectly ascribe the meaning of credible intervals to confidence intervals.

In frequentist statistics, a confidence interval for a parameter, $\theta$, is an interval computed from a set of data whose distribution depends on that parameter in some way. The interval is computed such that, if the process of drawing a sample and computing the interval were repeated identically ad infinitum, the proportion of the intervals that included the true value of the parameter would converge to $(1-\alpha)\%$. This does not mean that the probability of a given interval including the true value of the parameter is $(1-\alpha)\%$. Each interval either does include the true value or it does not include the true value. The 'confidence' is a property of the procedure used to compute the interval and pertains to the theoretical infinite set of such intervals.

Some additional notes:

  1. The confidence interval is a function of the data, $X$. Since the data are conceptualized as a random sample from a population, confidence intervals are random variables (although the confidence interval computed on a particular set of data is a realization).
  2. Often one can only compute approximate confidence intervals, which may have the nominal coverage asymptotically.
  3. It may not be possible to compute any exact confidence interval that might otherwise be preferred if the data or the parameter can only take discrete or otherwise limited values.
  4. The same ideas can be applied to a set of parameters, e.g., $\vec{\theta} = [\mu\ \ \sigma^2]^T$. In that case, it is more correct to refer to the confidence region.
  5. In a regression context, the set of confidence intervals for all possible conditional means ($\mu_Y|X=x$) is called a confidence band.
4058 questions
292
votes
10 answers

What's the difference between a confidence interval and a credible interval?

Joris and Srikant's exchange here got me wondering (again) if my internal explanations for the difference between confidence intervals and credible intervals were the correct ones. How you would explain the difference?
280
votes
16 answers

Why does a 95% Confidence Interval (CI) not imply a 95% chance of containing the mean?

It seems that through various related questions here, there is consensus that the "95%" part of what we call a "95% confidence interval" refers to the fact that if we were to exactly replicate our sampling and CI-computation procedures many times,…
Mike Lawrence
  • 12,691
  • 8
  • 40
  • 65
119
votes
6 answers

Difference between confidence intervals and prediction intervals

For a prediction interval in linear regression you still use $\hat{E}[Y|x] = \hat{\beta_0}+\hat{\beta}_{1}x$ to generate the interval. You also use this to generate a confidence interval of $E[Y|x_0]$. What's the difference between the two?
103
votes
12 answers

What, precisely, is a confidence interval?

I know roughly and informally what a confidence interval is. However, I can't seem to wrap my head around one rather important detail: According to Wikipedia: A confidence interval does not predict that the true value of the parameter has a…
dsimcha
  • 7,375
  • 7
  • 32
  • 29
101
votes
3 answers

What are examples where a "naive bootstrap" fails?

Suppose I have a set of sample data from an unknown or complex distribution, and I want to perform some inference on a statistic $T$ of the data. My default inclination is to just generate a bunch of bootstrap samples with replacement, and calculate…
raegtin
  • 9,090
  • 12
  • 48
  • 53
93
votes
9 answers

Are there any examples where Bayesian credible intervals are obviously inferior to frequentist confidence intervals

A recent question on the difference between confidence and credible intervals led me to start re-reading Edwin Jaynes' article on that topic: Jaynes, E. T., 1976. `Confidence Intervals vs Bayesian Intervals,' in Foundations of Probability Theory,…
Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
87
votes
3 answers

Shape of confidence interval for predicted values in linear regression

I have noticed that the confidence interval for predicted values in an linear regression tends to be narrow around the mean of the predictor and fat around the minimum and maximum values of the predictor. This can be seen in plots of these 4 linear…
luciano
  • 12,197
  • 30
  • 87
  • 119
73
votes
4 answers

A psychology journal banned p-values and confidence intervals; is it indeed wise to stop using them?

On 25 February 2015, the journal Basic and Applied Social Psychology issued an editorial banning $p$-values and confidence intervals from all future papers. Specifically, they say (formatting and emphasis are mine): [...] prior to publication,…
amoeba
  • 93,463
  • 28
  • 275
  • 317
70
votes
6 answers

Do the predictions of a Random Forest model have a prediction interval?

If I run a randomForest model, I can then make predictions based on the model. Is there a way to get a prediction interval of each of the predictions such that I know how "sure" the model is of its answer. If this is possible is it simply based on…
Dean MacGregor
  • 956
  • 1
  • 7
  • 10
63
votes
3 answers

Explain the xkcd jelly bean comic: What makes it funny?

I see that one time out of the twenty total tests they run, $p < 0.05$, so they wrongly assume that during one of the twenty tests, the result is significant ($0.05 = 1/20$). xkcd jelly bean comic - "Significant" Title: Significant Hover text:…
59
votes
4 answers

Confidence interval for Bernoulli sampling

I have a random sample of Bernoulli random variables $X_1 ... X_N$, where $X_i$ are i.i.d. r.v. and $P(X_i = 1) = p$, and $p$ is an unknown parameter. Obviously, one can find an estimate for $p$: $\hat{p}:=(X_1+\dots+X_N)/N$. My question is how can…
59
votes
4 answers

Are all values within a 95% confidence interval equally likely?

I have found discordant information on the question: "If one constructs a 95% confidence interval (CI) of a difference in means or a difference in proportions, are all values within the CI equally likely? Or, is the point estimate the most likely,…
pmgjones
  • 5,543
  • 8
  • 36
  • 36
59
votes
5 answers

Is it true that the percentile bootstrap should never be used?

In the MIT OpenCourseWare notes for 18.05 Introduction to Probability and Statistics, Spring 2014 (currently available here), it states: The bootstrap percentile method is appealing due to its simplicity. However it depends on the bootstrap…
Clarinetist
  • 3,761
  • 3
  • 25
  • 70
59
votes
1 answer

Bootstrap vs. jackknife

Both bootstrap and jackknife methods can be used to estimate bias and standard error of an estimate and mechanisms of both resampling methods are not huge different: sampling with replacement vs. leave out one observation at a time. However,…
Tu.2
  • 2,627
  • 6
  • 26
  • 26
57
votes
8 answers

Why continue to teach and use hypothesis testing (when confidence intervals are available)?

Why continue to teach and use hypothesis testing (with all its difficult concepts and which are among the most statistical sins) for problems where there is an interval estimator (confidence, bootstrap, credibility or whatever)? What is the best…
1
2 3
99 100