I am working on a customer purchase problem. I have 150 campaigns sent by email, that I denote C0
, C1
... C149
. Moreover, for each user i
:
Cj
= 0 if campaignj
is NOT received by customeri
,Cj
= 1 if campaignj
is received by customeri
nb_campaigns = the number of campaigns received by client
i
sucess = 0 or 1 if a customer ordered something thanks to the campaigns he received
I performed a logistic regression to explain the variable sucess
with nb_campaigns
, I got the following results (with statsmodels) :
When I perform a logistic regression to explain sucess with ALL campaigns AND
nb_campaigns
, I got different coefficients for the intercept and for nb_campaigns
and on the top of that, I got the following graph :
It seems abnormal to me that the probability decreases if the number of campaigns increases. And moreover, it is not coherent with the first graph... Is it a code mistake ? I thought maybe it is because I increased the variable nb_campaign
in the loop, but in fact if this variable increases by one, then one campaign needs to increase by 1 too.
Do you know what tests are performed in the previous tab and why the coef are not the same ?
If I want to describe the effect of nb_campaign
on sucess
, is it a mistake to consider all other variables ? Is it better to test sucess
vs nb_campaigns
and sucess
vs all campaigns independently ?
Thank you for your help !
William