Which criterion for comparring two means with 95% confidence interval?

Question

In order to improve statstical reliance and science reproducibilty, several authors have proposed to refer to confidence interval fot the difference between two means rather than p-value.

To be able to make statistical inference on confidence interval, some researchers gave visual guides to overlap between confidence interval and infer statistical significance from this overlap.

However, i recently red papersthat doesn't gave the same criterion.

In particular, when testing significance for two independent sample, Andy Field in his book and Cumming(2009) say that :

For a comparison of two independent means, twotailed p = 0.05 when the overlap of the 95 per cent CIs is no more than about half the average margin of error, that is when POL is about 0.5 or less.

However, in another article, written by Pfister & Janczik (2014) they gave this criterion :

Importantly, conclusions based on the CI are valid only for the diﬀerence between the means, and the CI thus corresponds to the t-test for two independent samples. If centered around one of the means this test is signifcant if, and only if, the CI does not include the other mean.

Wich is less restrictive.

What do you think about it? Should we always refer to the more restrictive criterion? Or maybe there is something i miss that makes a differences between those two articles?

Thank you for your answers.

references :

Cumming, G. (2009). Inference by eye: reading the overlap of independent confidence intervals. Statistics in medicine, 28(2), 205-220.

Pfister, R., & Janczyk, M. (2013). Confidence intervals for two sample means: Calculation, interpretation, and a few simple rules. Advances in Cognitive Psychology, 9(2), 74.

Relevant: https://stats.stackexchange.com/questions/250269/cumming-2008-claims-that-distribution-of-p-values-obtained-in-replications-dep — kjetil b halvorsen, Aug 30 '21 at 05:06

score 0 · Answer 1 · answered Jul 27 '19 at 14:45

To be able to make statistical inference on confidence interval, some researchers gave visual guides to overlap between confidence interval and infer statistical significance from this overlap.

If this is true, I find this to be bad practice. Statistical visualizations are not bulletproof and are certainly not a tool for assessing the probability of observing a result at least as extreme as the one we have obtained under the null.

When people say that we shouldn't rely on the p value too heavily, what I think they mean is that the p value should not be our sole measure. IMO, a p value should always be accompanied with a confidence interval so that we can see what differences are consistent with the data we observed.

You are right, thank's for your answer. Maybe i have been a little to far in the way i transcribed their intentions. However, my question stay relevant as, as you said it, it should be used to comfort stistical test results :) — Sylvain Penaud, Jul 27 '19 at 16:39

score 0 · Answer 2 · answered Jul 27 '19 at 21:53

I completely agree with what Demetri. But to answer your question above, i.e. whether these simple rules are good replacements for actual significance testing, you could try to run a small simulation. For example, if I take the second one (i.e. the mean of one sample is within the confidence interval around the other mean), you can see that we end up rejecting the null hypothesis too often:

B <- 1000
n <- 100
z_critical <- qnorm(0.975)

results <- replicate(B, {
  # Generate random variables with same mean
  X <- rnorm(n)
  Y <- rnorm(n)

  mean_x <- mean(X)
  sigma_x <- sd(X)
  mean_y <- mean(Y)

  # Compute confidence interval
  x_confInt <- c(mean_x - t_critical*sigma_x/sqrt(n), 
                 mean_x + t_critical*sigma_x/sqrt(n))

  # Check if other mean is within CI
  (mean_y >= x_confInt[1] && mean_y <= x_confInt[2])
})

# We would expect 0.05 on average
1 - mean(results)

[1] 0.171

You can replace the normal distribution critical value by one from the $t$-distribution and you would also get inflated type I error.

I'm reading this as "Only reject when neither CI contains the other mean". Am I interpreting correctly? — Dave, Jul 27 '19 at 22:28
@Dave That's a good point, I didn't read it that way at first, but it definitely is a valid interpretation. — M Turgeon, Jul 27 '19 at 23:17

Which criterion for comparring two means with 95% confidence interval?

2 Answers2