Here are three tests that are commonly used to compare binomial proportions, in appropriate circumstances.
Consider fictitious data sampled in R, based on sample sizes $n = 300$ and actual
success rates $p_a = 0.6, p_b = 0.7.$ All three tests detect a
difference with P-values about $0.001.$
set.seed(731)
n = 300
x.a = rbinom(n, 1, .60)
x.b = rbinom(n, 1, .70)
table(x.a)
x.a
0 1
124 176
table(x.b)
x.b
0 1
85 215
Welch t test: Appropriate for large $n.$
t.test(x.a, x.b)
Welch Two Sample t-test
data: x.a and x.b
t = -3.3677, df = 593.35, p-value = 0.0008071
alternative hypothesis:
true difference in means is not equal to 0
95 percent confidence interval:
-0.20581319 -0.05418681
sample estimates:
mean of x mean of y
0.5866667 0.7166667
A test of binomial proportions using a normal approximation, similar to a chi-squared test on a $2 \times 2$ table. [The continuity correction is slightly conservative and may not be needed for $n$ as large as $300.]$
yes = c(sum(x.a),sum(x.b)); yes
[1] 176 215
prop.test(yes, c(n,n))
2-sample test for equality of proportions
with continuity correction
data: yes out of c(n, n)
X-squared = 10.602, df = 1, p-value = 0.00113
alternative hypothesis: two.sided
95 percent confidence interval:
-0.20886568 -0.05113432
sample estimates:
prop 1 prop 2
0.5866667 0.7166667
Fisher's Exact Test, which uses a hypergeometric distribution based
on row and column totals of a $2 \times 2$ table under the null hypothesis
that the two groups have equal success probabilities: [It is especially useful if $n$ is small.]
TBL = cbind(yes, n-yes); TBL
yes
[1,] 176 124
[2,] 215 85
fisher.test(TBL)
Fisher's Exact Test for Count Data
data: TBL
p-value = 0.001105
alternative hypothesis:
true odds ratio is not equal to 1
95 percent confidence interval:
0.3932415 0.7998073
sample estimates:
odds ratio
0.5616766
A brief simulation in R shows that the Welch t test has power about 99% of detecting a difference
between $p.a = 0.6$ and $p.b = 0.7$ with sample sizes $n=300.$
Similar simulations can find the power of the other two tests for
appropriate sample sizes.
set.seed(2021)
n = 300; p.a = 0.5; p.b = 0.7
pv = replicate(10^5,
t.test(rbinom(n,1,p.a),rbinom(n,1,p.b))$p.val)
mean(pv <= 0.05)
[1] 0.99898 # approximate power
For $n = 150$ and the same proportions, the power is about 95%.
Note: Python must have equivalent procedures for such tests.