1

For a df that looks something like the following

group     signedup
A                  1
B                  1
A                  1
B                  1
B                  0
B                  0
A                  0

I need to calculate the difference in means between group A and B for the 'signedup' attribute . Not sure if my solution is correct. Any insights will be appreciated!

Some background information: 'group' indicates whether the user is assigned to the control group (A) or the treatment group (B). signedup' indicates whether the user signed up for the premium service or not with a 1 or 0, respectively

from scipy import stats as scs
df=pd.read_csv(filename)
df_A=df.loc[df['group'] == 'A', ['signedup']]
df_B=df.loc[df['group'] == 'B', ['signedup']]
t,p= scs.ttest_ind(df_A,df_B)
if p < 0.05:
  print('Difference in means is statistically significant')
  • Welcome to Cross Validated! Is this a statistics question disguised as a coding question? If you just want to know about your Python code, 1) at a glance, it looks right and 2) pure coding questions are off-topic here, so a thorough debug or code review is not for Cross Validated. // For reasons discussed [here](https://stats.stackexchange.com/questions/535683/p-values-from-t-test-and-prop-test-differ-considerably/535685#535685), the t-test is not ideal here. I am a fan of the G-test, though I do not know a canned Python function. The chi-squared test (similar) probably is in scipy. – Dave Jul 31 '21 at 18:56
  • Unrelated, is it common to import stats as scs like importing numpy as no and pandas as pd? I have not done that, though most of my statistics work is in R, while I use Python for my data wrangling. – Dave Jul 31 '21 at 18:58
  • Hi Dave, I was asking if the use of a T test is correct for difference in means. – freshman_2021 Jul 31 '21 at 18:59
  • and yes it is totally acceptable to import stats the way I have done – freshman_2021 Jul 31 '21 at 19:01
  • My linked answer addresses the issue of t-testing binary variables like you have. In summary, you can do better than the t-test. – Dave Jul 31 '21 at 19:06

1 Answers1

1

Here are three tests that are commonly used to compare binomial proportions, in appropriate circumstances.

Consider fictitious data sampled in R, based on sample sizes $n = 300$ and actual success rates $p_a = 0.6, p_b = 0.7.$ All three tests detect a difference with P-values about $0.001.$

set.seed(731)
n = 300
x.a = rbinom(n, 1, .60)
x.b = rbinom(n, 1, .70)

table(x.a)
x.a
  0   1 
124 176 
table(x.b)
x.b
  0   1 
 85 215 

Welch t test: Appropriate for large $n.$

t.test(x.a, x.b)

        Welch Two Sample t-test

data:  x.a and x.b
t = -3.3677, df = 593.35, p-value = 0.0008071
alternative hypothesis: 
  true difference in means is not equal to 0
 95 percent confidence interval:
  -0.20581319 -0.05418681
sample estimates:
 mean of x mean of y 
 0.5866667 0.7166667 

A test of binomial proportions using a normal approximation, similar to a chi-squared test on a $2 \times 2$ table. [The continuity correction is slightly conservative and may not be needed for $n$ as large as $300.]$

yes = c(sum(x.a),sum(x.b));  yes
[1] 176 215
prop.test(yes, c(n,n))

        2-sample test for equality of proportions 
        with continuity correction

data:  yes out of c(n, n)
X-squared = 10.602, df = 1, p-value = 0.00113
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.20886568 -0.05113432
sample estimates:
   prop 1    prop 2 
0.5866667 0.7166667 

Fisher's Exact Test, which uses a hypergeometric distribution based on row and column totals of a $2 \times 2$ table under the null hypothesis that the two groups have equal success probabilities: [It is especially useful if $n$ is small.]

TBL = cbind(yes, n-yes);  TBL
     yes    
[1,] 176 124
[2,] 215  85
fisher.test(TBL)

        Fisher's Exact Test for Count Data

data:  TBL
p-value = 0.001105
alternative hypothesis: 
 true odds ratio is not equal to 1
95 percent confidence interval:
 0.3932415 0.7998073
sample estimates:
odds ratio 
 0.5616766 

A brief simulation in R shows that the Welch t test has power about 99% of detecting a difference between $p.a = 0.6$ and $p.b = 0.7$ with sample sizes $n=300.$ Similar simulations can find the power of the other two tests for appropriate sample sizes.

set.seed(2021)
n = 300; p.a = 0.5; p.b = 0.7
pv = replicate(10^5, 
               t.test(rbinom(n,1,p.a),rbinom(n,1,p.b))$p.val)
mean(pv <= 0.05)
[1] 0.99898  # approximate power

For $n = 150$ and the same proportions, the power is about 95%.

Note: Python must have equivalent procedures for such tests.

BruceET
  • 47,896
  • 2
  • 28
  • 76