Test for significance of difference of ratios across subpopulations

Question

I am working in a setup with two binary independent variables. One is experimental, $T$: treated vs. not treated. The other is a feature $F$ that I expect affects how strongly treatment affects outcome. The outcome $S$ is a Bernoulli random variable with very low $p$ (on the order of 0.01-1.0%). I do, however, have at least tens of thousands of trials and hundreds to thousands of successes for each of the 4 subpopulations.

The goal is to compute the effect of treatment $T$ and determine whether this is different depending on $F$. More precisely, I want to compute the lift in the outcome caused by $T$ for each scenario:

\begin{align} \newcommand{\lift}{{\rm lift}} \lift_0 &= \frac{P(S|T=1,F=0)-P(S|T=0,F=0)}{P(S|T=0,F=0)} \\[10pt] \lift_1 &= \frac{P(S|T=1,F=1)-P(S|T=0,F=1)}{P(S|T=0,F=1)} \end{align}

Based on this question, I can compute whether each lift is significantly different from zero. But I'd like to take this a step further and determine whether the lifts are statistically different from each other. How can I think about this problem? It seems there may be some connection to difference-in-differences, but I'm computing lift (not difference) and my two control groups are not necessarily similar, so I'm not sure how well that applies.

A concrete example may help the discussion, so here are some numbers to work from:

+---------+-----------+---------+-----------+--------+
| Feature | Treatment | Trials  | Successes |    p   |
+---------+-----------+---------+-----------+--------+
| No      | No        | 4169157 |      1064 | 0.026% |
| No      | Yes       | 2892839 |       794 | 0.027% |
| Yes     | No        |  577625 |       951 | 0.165% |
| Yes     | Yes       |  823158 |      2260 | 0.275% |
+---------+-----------+---------+-----------+--------+

The two lifts therefore are $\lift_0=7.5$% and $\lift_1=67$%. At what significance level are these different?

score 1 · Answer 1 · answered Jul 08 '17 at 17:17

There are many methods for analyzing data with a binary output, but a good default is logistic regression. There is no reason not to use that here.

You want to know if "F... affects how strongly treatment affects outcome". In statistical terms, that is asking if there is an interaction between the feature and the treatment.

Here is an example, coded in R, for your data:

d = read.table(text=" Feature  Treatment  Trials   Successes
                      No       No         4169157       1064
                      No       Yes        2892839        794
                      Yes      No          577625        951
                      Yes      Yes         823158       2260", header=T)

m = glm(cbind(Successes, Trials-Successes)~Treatment*Feature, d, family=binomial)
summary(m)
# Call:
# glm(formula = cbind(Successes, Trials - Successes) ~ Treatment * 
#     Feature, family = binomial, data = d)
# 
# Deviance Residuals: 
# [1]  0  0  0  0
# 
# Coefficients:
#                         Estimate Std. Error  z value Pr(>|z|)    
# (Intercept)             -8.27318    0.03066 -269.828  < 2e-16 ***
# TreatmentYes             0.07279    0.04690    1.552    0.121    
# FeatureYes               1.86566    0.04465   41.787  < 2e-16 ***
# TreatmentYes:FeatureYes  0.43970    0.06080    7.232 4.77e-13 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
#     Null deviance:  5.7554e+03  on 3  degrees of freedom
# Residual deviance: -2.8453e-12  on 0  degrees of freedom
# AIC: 43.575
# 
# Number of Fisher Scoring iterations: 3

The test of the interaction, that is, the test of whether the effect of the treatment differs depending on whether the feature is present is displayed in the last row of the coefficients table. You can see that it is highly significant, so the feature does seem to make a difference to the effectiveness of the treatment.

Test for significance of difference of ratios across subpopulations

1 Answers1