3

Assume that I am building a churn prediction model, and I collect observational data of customers who registered in the last 12-18 months. Assume that 50% of customers churned. Customers who are predicted to churn are receiving more favorable treatment by the business in an attempt to reduce overall churn %. 18 months later, I analyze both the predictions and reality -- whether customers that were predicted to churn churned and vice versa. The model predicted 50% of the cohort of customers who registered during a particular time interval would churn, whereas in reality, 10% churned. It may be the case that the model is not drifting, but rather that the treatment is having an effect, thus lowering the churn %.

This difference between the prediction and reality could be caused by the following:

  1. The treatment which is positively affecting customer behavior, is causing those customers who were likely to churn to not. In this case, there is no evidence of concept drift, but rather the change in distribution is caused directly by the model.

  2. The treatment is not affecting customer behavior but rather some confounding variable that wasn't accounted for is (e.g., the quality of service offered by competitors has diminished and is causing fewer (of our) customers to churn). In this case, I include the confounder to my data set, and rebuild the model on this new assumption. In this case, there is evidence of concept drift.

How can one distinguish between both in an attempt to attribute one to the reason the distribution has changed?

Jay Ekosanmi
  • 561
  • 1
  • 10
  • 1
    Did you include the initial predicted probability of churn and/or being treated as features in the second-period model? – dimitriy Jan 20 '22 at 21:47
  • 2
    More broadly, you may find the Ascarza paper linked in [this answer](https://stats.stackexchange.com/a/554583/7071) interesting for your project. – dimitriy Jan 20 '22 at 21:56

0 Answers0