1

We are trying to prove that three groups have the same proportion of some effect. We are struggling with formulating the null hypothesis, and whether or not the sample size is big enough. The proportions are 37/213, 55/344 and 32/210. This seems like a very common scenario, but naive google search was not that helpful.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467

1 Answers1

1

Now you have data, and it is somewhat late for a power analysis, if not for a future confirmation experiment. The hypothesis of equal proportions in three groups is the hypothesis of homogeneity in a contingency table. You data is (I am using R):

       yourtable
       G1  G2  G3
x      37  55  32
total 213 344 210

The proportions are

sapply(as.data.frame(yourtable), function(x) x[1]/x[2])
       G1        G2        G3 
0.1737089 0.1598837 0.1523810 

For the homogeneity test we need this in the form of a contingency table:

con.table <- yourtable
con.table[2, ] <- con.table[2, ]-con.table[1, ] 
row.names(con.table) <- c("x", "n-x")

chisq.test(con.table)

    Pearson's Chi-squared test

data:  con.table
X-squared = 0.36957, df = 2, p-value = 0.8313  

so there is no evidence against homogeneity, but you need evidence for homogeneity. I would first go for confidence intervals, which can be calculated via logistic regression:

my.df <- as.data.frame(t(con.table))
 my.df$G <- rownames(my.df)

> logmod <- glm( cbind(x, `n-x`) ~ 0+G, family=binomial, data=my.df)
> logmod

Call:  glm(formula = cbind(x, `n-x`) ~ 0 + G, family = binomial, data = my.df)

Coefficients:
   GG1     GG2     GG3  
-1.560  -1.659  -1.716  

Degrees of Freedom: 3 Total (i.e. Null);  0 Residual
Null Deviance:      385 
Residual Deviance: -3.553e-14   AIC: 22.08
> confint(logmod)
Waiting for profiling to be done...
        2.5 %    97.5 %
GG1 -1.928812 -1.217750
GG2 -1.957458 -1.379635
GG3 -2.110139 -1.354896

You could backtransform that confidence interval to probability scale, and see if it is close enough for your purposes. Else you would need to collect more data,sample size calculations could be based on simulations from this model, or look at ideas from Experimental Design on Testing Proportions or Sample size for logistic regression?

For experiments with the code, here is a textual representation of yourtable:

 dput(yourtable)
structure(c(37, 213, 55, 344, 32, 210), .Dim = 2:3, .Dimnames = list(
    c("x", "total"), c("G1", "G2", "G3")))
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • We used the overlap of CI as a measure of equality, but I'm starting to think this is actually wrong ... could you explain how to "back-transform CI to probability scale", and can this be translated to a minimal sample size needed? thanks! – OrenIshShalom Mar 16 '19 at 17:16