Why would a 99% confidence interval exclude 1/4 of the values in the sample data?

Question

Take the following data set of eight observations:

data = 1.22, 0.93, 1.03, 1.45, 1.07, 1.09, 1.17, 1.20

To compute a confidence interval on this data, I use the following formula:

CI = mean(data) +/- t.crit x (sd(data)/sqrt(df))

... and the results are as follows:

mean(data) = 1.14

sd(data) = 0.155

df = 7

t.crit = 3.499 (.01 in two tails).

CI = 0.94 to 1.35

The strange thing here is that two of the eight observations in the sample (0.93 and 1.45) fall outside of this confidence interval. That seems counter-intuitive to me; shouldn't the presence of these data in the sample have increased the standard deviation enough such that they would fall within the confidence interval?

Find out what happens when you perform the same computation by replicating the data many times. For instance, replace `data` by `rep(data, 100)`. Then visit some of our higher-voted threads on interpreting confidence intervals, such as https://stats.stackexchange.com/questions/26450/why-does-a-95-confidence-interval-ci-not-imply-a-95-chance-of-containing-the/26457#26457. — whuber, Apr 20 '20 at 19:32
1) Why would you expect a $99\%$ anything to cover $100\%$? 2) The confidence interval concerns the mean of your data, so if you had many thousands of observations, your confidence would be quite narrow and exclude most of your observed values. — Dave, Apr 20 '20 at 19:32
You're right, it seems I needed a refresher on what a confidence interval actually represents - an estimate of the population mean, not the likelihood of a value that had occurred in the sample would occur again. Thanks — Adam Young, Apr 20 '20 at 21:33

Why would a 99% confidence interval *exclude* 1/4 of the values in the sample data?

0 Answers0

Why would a 99% confidence interval exclude 1/4 of the values in the sample data?