0

I am trying to understand confidence intervals better.

If two different labs ran the same experiment, with the same number of samples and both got the same mean values but their 95% CI differed.

Lab 1 got a mean value of 1.6 & 95% CI: 1.0 to 2.2

Lab 2 got a mean value of 1.6 & 95% CI: 1.5 to 1.7

What could you actually deduce from this?

If you took both results separately, would you be any more confident that the true mean was 1.6 from the results from lab 2? I feel like I am not really understanding what a CI is and misinterpreting it.

Nathan
  • 49
  • 4
  • 3
    @mkt I suspect you intended to write the opposite of some of the things you did. First, it is clear that the variance was much *smaller* in the Lab 2 sample than in the Lab 1 sample. Second, your reference to "probability density" sounds like you are taking these to be Bayesian *credible intervals* rather than confidence intervals, because it isn't clear to what "probability density" actually refers. As such, your comment might confuse the issue more than clarify it. – whuber Jul 17 '17 at 13:53
  • 2
    This is a problematic question because although you *can* deduce something about the mean by combining these CIs, something else unrelated to CIs is going on. Because one interval is so much narrower than the other, you have produced significant evidence that the *variances* observed by the labs differ. That would make one hesitate to interpret the CIs (or to combine them) until the reason for this difference in variances is better understood. – whuber Jul 17 '17 at 14:11
  • As whuber said, in your case it means that there is a significant difference in variance between the samples. A rule of the thumb is that, when the C.I. of the samples do not overlap, then there is a significant difference between the sample statistics (not your case). Another one is that, when a C.I. contains 0 in its range (i.e. one of the intervals is negative), then you simply don't have good enough data to estimate an interval at that confidence level. – Digio Jul 17 '17 at 14:27
  • 1
    @whuber Fair points, that was sloppy writing. I am deleting my previous comment. – mkt Jul 17 '17 at 14:48
  • @Digio Please be careful, lest you confuse readers even more. Although it's correct that non-overlap of CIs often implies significance, the inverse is not true. See https://stats.stackexchange.com/questions/18215. Moreover, a CI has nothing whatsoever to do with containing zeros or not: you must be thinking of using a CI in a particular kind of null hypothesis test, one that is not germane to this discussion. Your characterization of a CI is [disputed by almost everyone;](https://stats.stackexchange.com/questions/26450) in particular, (1) is misleading and (2) is strikingly false. – whuber Jul 17 '17 at 17:35
  • @whuber you must have misunderstood my comment on the [inclusion of zero](http://onlinestatbook.com/2/logic_of_hypothesis_testing/sign_conf.html) in a CI because what I was trying to explain is more or less [common knowledge](https://stats.stackexchange.com/questions/120949/why-does-a-confidence-interval-including-0-mean-the-difference-is-not-significan). Also, I said exactly the same thing as you did: non-overlapping intervals _imply_ significant difference between two sample statistics - I'm not sure what you mean by "the inverse is not true" but I didn't mention anything like that. – Digio Jul 18 '17 at 09:12
  • @whuber, My first interpretation of CI is formally wrong as the population mean is a constant and not a random variable, nonetheless, it is the standard interpretation in the applications of CI, e.g. in regression analysis it is the ["estimate of an interval in which future observations will fall, with a certain probability"](https://en.wikipedia.org/wiki/Prediction_interval). – Digio Jul 18 '17 at 09:41
  • My second interpretation is actually a [textbook definition](https://stattrek.com/estimation/confidence-interval.aspx) and perhaps one of the strictest since "confidence" is not interpreted as "probability". As described [here](http://onlinelibrary.wiley.com/doi/10.1002/asi.23744/abstract): _"The confidence interval describes the level of uncertainty of a sampling method. The statistic and the margin of error define an interval estimate that describes the precision of the method"_. – Digio Jul 18 '17 at 10:03
  • The above definition (2) can be extended using the property of [long run relative frequency](http://www.stats.org.uk/probability/frequency.html) and, under the assumption that our sampling method is unbiased, we can derive interpretation (1), where "confidence" is viewed as a probability and the sample statistic as the population parameter (I can cite [sources](https://www.amazon.co.uk/Statistical-Methods-Epidemiologic-Research-Merrill/dp/1284050203) for this reasoning too). If you still find my comments misleading then I will remove them. – Digio Jul 18 '17 at 10:08
  • Your statement beginning "if you repeat..." simply is untrue. Repetition of an experiment does not ensure that an estimate will fall within a confidence interval. The estimate will vary according to its sampling distribution and the confidence interval is uncertain. What do you think will happen if that CI happens to be one of the 5% that does not cover the true parameter? When the experiment is repeated, then most of the time the estimate will *not* lie within that particular CI. – whuber Jul 18 '17 at 12:10
  • The repetition of an experiment an _infinite_ number of times is a theoretical condition required to interpret a frequency as a probability, but I didn't say that it affects the frequency. From a frequency of infinite repetitions you can deduce the probability of having the population parameter inside the CI of a single experiment. A very similar explanation is provided [here](https://stackoverflow.com/questions/42799002/how-to-interpret-the-upper-lower-bound-of-a-datapoint-with-confidence-intervals). I would change my wording to avoid all confusion but since I can't edit I will just remove. – Digio Jul 19 '17 at 09:00
  • [Edited] Interpretations: (1) A confidence interval is associated to a population statistic, i.e. it is the range of values in which the population statistic lies, within a certain probability (confidence); (2) A confidence interval is associated to a sampling method, i.e. if you repeat the same sampling process for an infinite number of times, the fraction of intervals that contain the true population parameter will be equal to (e.g.) 95%. – Digio Jul 19 '17 at 09:37

1 Answers1

1

Confidence intervals are tricky things, but your intuition is right. The fact that lab 2 gets a smaller interval means that they are more confident that the true mean is close to 1.6. The tricky part is what confidence actually means in this case.

What it does NOT mean is that there is a 95% probability that the true mean falls within this interval. What it does mean is that 95% of the intervals constructed in this way will contain the true mean. Whether or not your interval is one of those 95% or one of the unlucky few 5% one cannot say.

If a confidence interval gets smaller it means that it is "harder" to hit the true mean with 95% of these intervals, therefore if you still hit the mean in 95% of the cases with a smaller interval you have a more precise measurement. The usual way to achieve this is by increasing your sample size.

I personally find this video from Dr. Nic explains it rather well.

Maarten Punt
  • 683
  • 5
  • 11
  • +1. Though I would say that both interpretations are generally acceptable depending on the branch of statistics. – Digio Jul 17 '17 at 14:41