Logistic Regression - between 2 unrelated treatments?

Question

Context I'm testing the effects of various email interventions on getting people to sign up for a financial literacy event, and benchmarking them against a default, control intervention. The outcome variable is sign-up (i.e. Yes, or No). So it's a binary outcome.

For clarity:

Control (T0) Email: A default email that the sponsoring agency has been using for years. ("Please sign up!")
Treatment 1 (T1) Email: Same email, but adds an extra 'note'. (e.g., "By the way, did you know that.....")
Treatment 2 (T2) Email: Same email again, with another type of 'note'.
Treatment 3 (T3) Email: Same email again, with another type of 'note'.
Treatment 4 (T4) Email: Same email again, with another type of 'note'.

Further to that, I want to study subgroups to see how the effects pan out. (Namely, in each intervention, did the males have a higher takeup rate? did the lower income groups have a higher takeup rate?)

So I have done my randomization, split my pool into 5 groups and sent them either of the 5 emails. And I'm counting responses as we speak.

Problem My initial thought is to run a logistic regression comparing the Control and each of the treatments, as the outcome is binary, and I can simply codify dummy variables per treatment. However, my control is not complete inaction - there is a default message, and therefore it is an in-principle treatment ("t0", if you will). In that sense, running a logistic regression as-is would seem ill-suited for the task, since a regression step would be comparing complete inaction against treatment, not my control against treatment.

My ask Could any of the folks here point me to a viable analysis method? Is there some sort of logistic regression suited for two treatments? (hence the title header?)

Thanks.

Could you describe in plain English what were the experiment and control conditions? It is not exactly clear for me from your description and it seems that you consider those differences to be the key to your question. — Tim, Feb 01 '22 at 12:28

score 1 · Answer 1 · answered Feb 01 '22 at 13:17

1

There's no problem with your proposed logistic regression - each dummy variable coefficient is just the log-odds of that email resulting in a success (assuming no other regressors). It doesn't really matter if your "control" is actually a control or just another email.

I'd be more concerned with comparing various subgroups within each sample. As the number of subgroups grows, the probability of seeing a false positive (e.g. just by change, some subgroup will have a higher success rate on some email).

answered Feb 01 '22 at 13:17

atlasd

36
4

Thanks for responding! So am I right to say that there is no means of direct comparison between my control and a treatment? If I get my log(odds) of treatment 4, for instance, that just tells me probability of success given treatment4, and log (odds) of treatment 0 = probability of success given treatment 0. And I'll just eyeball the numbers and make some conclusion from that? – statsnoobj101 Feb 01 '22 at 13:54
You can do that, but you probably want to make sure the log-odds for the thing you select is significantly higher than the existing email (unless there's no cost to changing the email, in which case you can pick the higher one). There's quite a science to A/B testing, so depending on how deep you want to go/costs of moving between emails, there's a lot you can do here. – atlasd Feb 01 '22 at 13:59
Thanks - so just to confirm - there is no way to (scientifically) and directly compare control and either of treatments? (cost for me is of no consequence, so it's all a matter of rigor here) – statsnoobj101 Feb 01 '22 at 14:39
Depends on what you mean by scientifically, but if you build a proper model of the data, and choose the treatment with the largest effect, I think it's defensible. (I say proper model, because you may well have covariates that need to be included, especially if you're looking at subgroups, but I think that's a different question at this point). – atlasd Feb 01 '22 at 14:46

score 1 · Accepted Answer · answered Feb 01 '22 at 16:58

When you dummy code a categorical predictor, you create a dummy variable for all but one level of the treatment. The interpretation of the coefficient on each dummy is the log of the odds ratio between treatment level corresponding to the dummy and the omitted treatment level. That is, your model will look like the following: $$ \text{log}\left(\frac{p}{1-p} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + \beta_4 X_4 $$ where $X_a$ is $1$ for being in treatment group $a$ and $0$ otherwise.

The coefficient on $X_1$ is the log odds ratio of the outcome for being in treatment group 1 vs. treatment group 0, that is, $$ \exp(\beta_1)=\frac{\text{odds}(Y=1| \text{T1})}{\text{odds}(Y=1| \text{T0})} $$ So, each coefficient compares a treatment group to the reference group. That can be your control condition. If you have a sixth group corresponding to complete inaction, then you can simply set that as the reference category and each coefficient will correspond to the effect of being in one of the other treatment groups vs. that group. You can control which group is the reference group and therefore what the interpretations of the coefficients are. You can also change the reference category (the one without the dummy variable) and re-run the model to get any comparison you want to get.

Logistic Regression - between 2 unrelated treatments?

2 Answers2

Linked