We are trying to prove that three groups have the same proportion of some effect. We are struggling with formulating the null hypothesis, and whether or not the sample size is big enough. The proportions are 37/213
, 55/344
and 32/210
. This seems like a very common scenario, but naive google search was not that helpful.

- 63,378
- 26
- 142
- 467

- 107
- 9
-
A very similar Q: https://stats.stackexchange.com/questions/77965/sample-size-calculation-for-comparison-of-3-proportions – kjetil b halvorsen Mar 05 '19 at 08:51
1 Answers
Now you have data, and it is somewhat late for a power analysis, if not for a future confirmation experiment. The hypothesis of equal proportions in three groups is the hypothesis of homogeneity in a contingency table. You data is (I am using R
):
yourtable
G1 G2 G3
x 37 55 32
total 213 344 210
The proportions are
sapply(as.data.frame(yourtable), function(x) x[1]/x[2])
G1 G2 G3
0.1737089 0.1598837 0.1523810
For the homogeneity test we need this in the form of a contingency table:
con.table <- yourtable
con.table[2, ] <- con.table[2, ]-con.table[1, ]
row.names(con.table) <- c("x", "n-x")
chisq.test(con.table)
Pearson's Chi-squared test
data: con.table
X-squared = 0.36957, df = 2, p-value = 0.8313
so there is no evidence against homogeneity, but you need evidence for homogeneity. I would first go for confidence intervals, which can be calculated via logistic regression:
my.df <- as.data.frame(t(con.table))
my.df$G <- rownames(my.df)
> logmod <- glm( cbind(x, `n-x`) ~ 0+G, family=binomial, data=my.df)
> logmod
Call: glm(formula = cbind(x, `n-x`) ~ 0 + G, family = binomial, data = my.df)
Coefficients:
GG1 GG2 GG3
-1.560 -1.659 -1.716
Degrees of Freedom: 3 Total (i.e. Null); 0 Residual
Null Deviance: 385
Residual Deviance: -3.553e-14 AIC: 22.08
> confint(logmod)
Waiting for profiling to be done...
2.5 % 97.5 %
GG1 -1.928812 -1.217750
GG2 -1.957458 -1.379635
GG3 -2.110139 -1.354896
You could backtransform that confidence interval to probability scale, and see if it is close enough for your purposes. Else you would need to collect more data,sample size calculations could be based on simulations from this model, or look at ideas from Experimental Design on Testing Proportions or Sample size for logistic regression?
For experiments with the code, here is a textual representation of yourtable
:
dput(yourtable)
structure(c(37, 213, 55, 344, 32, 210), .Dim = 2:3, .Dimnames = list(
c("x", "total"), c("G1", "G2", "G3")))

- 63,378
- 26
- 142
- 467
-
We used the overlap of CI as a measure of equality, but I'm starting to think this is actually wrong ... could you explain how to "back-transform CI to probability scale", and can this be translated to a minimal sample size needed? thanks! – OrenIshShalom Mar 16 '19 at 17:16