0

Take the following data set of eight observations:

data = 1.22, 0.93, 1.03, 1.45, 1.07, 1.09, 1.17, 1.20

To compute a confidence interval on this data, I use the following formula:

CI = mean(data) +/- t.crit x (sd(data)/sqrt(df))

... and the results are as follows:

mean(data) = 1.14

sd(data) = 0.155

df = 7

t.crit = 3.499 (.01 in two tails).

CI = 0.94 to 1.35

The strange thing here is that two of the eight observations in the sample (0.93 and 1.45) fall outside of this confidence interval. That seems counter-intuitive to me; shouldn't the presence of these data in the sample have increased the standard deviation enough such that they would fall within the confidence interval?

  • 1
    Find out what happens when you perform the same computation by replicating the data many times. For instance, replace `data` by `rep(data, 100)`. Then visit some of our higher-voted threads on interpreting confidence intervals, such as https://stats.stackexchange.com/questions/26450/why-does-a-95-confidence-interval-ci-not-imply-a-95-chance-of-containing-the/26457#26457. – whuber Apr 20 '20 at 19:32
  • 1
    1) Why would you expect a $99\%$ anything to cover $100\%$? 2) The confidence interval concerns the mean of your data, so if you had many thousands of observations, your confidence would be quite narrow and exclude most of your observed values. – Dave Apr 20 '20 at 19:32
  • 1
    You're right, it seems I needed a refresher on what a confidence interval actually represents - an estimate of the population mean, not the likelihood of a value that had occurred in the sample would occur again. Thanks – Adam Young Apr 20 '20 at 21:33

0 Answers0