I'm using a Poisson GLM to model the effects of advertising (number of ads bought) on the number of sales (numConvs
). If possible I'd like to use the model to get an idea of how many sales are due to one type of ad (ad1
), how many are due to another type of ad (ad2
), and how many were not caused by either type of ad - a 'baseline' number of sales.
Intuitively I thought I could do this by using the model to predict how many sales I would get when the number of ads bought for one or both types of ad was zero. However the predicted values do not add up:
#===============
# Simulate data
#===============
set.seed(1)
intercept <- 1
ad1Coef <- 0.03
ad2Coef <- 0.05
ad1 <- sample(1:50, size=100, replace=TRUE)
ad2 <- sample(1:50, size=100, replace=TRUE)
numConvs <- rpois(n=length(ad1), lambda=exp(intercept + ad1Coef*ad1 + ad2Coef*ad2))
df <- data.frame(numConvs=numConvs, ad1=ad1, ad2=ad2)
#===============
# Model & predict
#===============
# Model
mPois <- glm(numConvs ~ ad1 + ad2, data=df, family='poisson')
summary(mPois)$coef
# Predict num conversions based on number of ads in final row of data
finalRowDF <- df[nrow(df),]
finalRowDF # 43 actual conversions
library('dplyr')
predict(mPois, type='response', newdata=finalRowDF) # Predicted conversions when both ads are playing: 49
predict(mPois, type='response', newdata=mutate(finalRowDF, ad1=0, ad2=0)) # Predicted conversions when no ads are playing: 3
predict(mPois, type='response', newdata=mutate(finalRowDF, ad1=0)) # Predicted conversions when only ad2 is playing: 18
predict(mPois, type='response', newdata=mutate(finalRowDF, ad2=0)) # Predicted conversions when only ad1 is playing: 7
As shown in the code above, the model predicts that there are 49 sales when both ads are playing, 18 sales when only ad2 is playing, 7 sales when only ad1 is playing, and 3 sales when no ads are playing.
From this, I would intuitively think that:
- there was a 'baseline' number of sales of 3
- ad2 caused 18 - 3 = 16 sales
- ad1 caused 7 - 3 = 4 sales
However, these numbers ad up to 23, less than half of the 49 predicted when both ads are playing. What is going on here? I get that a Poisson GLM is a multiplicative model on the response scale, so it's probably not appropriate to use my 'adding and subtracting' method above to partition effects. But even so, where do the remaining 26 predicted conversions come from? Is this some kind of interaction effect, where the effect of ad1 depends on the level of ad2? And is there some appropriate way to get a broad idea of how many ads are caused by ad1 vs ad2 vs baseline effect?