Carrying Out Interventions Based on ML "Feature Importances"

Question

Recently, I have been studying causal inference and have come to a bit of a crossroads with respect to making decisions based on the analysis of data (especially in a business/industry setting). Specifically, I am referring to common problems like "churn modelling", segmentation, and lifetime value problems where the goal is to figure out specific demographics to "target" to increase revenue or to decrease churn, etc.

Often, I see these problems solved in the following way (whether good or bad). Take a bunch of predictors that are plausibly associated in some way with the outcome variable (whether that is churn, lifetime value, or some other profitability metric) and then fit a machine learning model to the problem (using the standard test sets/data splitting, etc.). Then, look at the feature importances of the best predictive model (perhaps using a method that corrects for multicollinearity, like SHAP scores) and determine the most impactful features, from which we understand as the most predictive variables. We can make then decisions on who to target, market to, etc. based on these influential variables.

Now, we know that none of this is causal in any way since we are just exploiting correlations. We didn't consider the actual causal structure of the problem, draw out DAG's like Pearl suggests, and condition on sufficient adjustment sets to derive causal effects (and ultimately see the impact of "treatments"). Through careful handling of causality, we can deal with issues that may arise from the above approach like Simpson's paradox, for example.

My question is as follows: is the first method of modelling, and ultimately, the business decisions made from the first method, incorrect or dangerous? Equivalently, is absolute causality needed in this setting? I can see why this may be the case - but in say a huge dataset with many predictors and proper regularization, I have a tough time believing that the ML approach would lead to outright bad decisions (though perhaps not quite as strong conclusions). In addition, I think many would agree that the first method is less time-consuming. Writing out a causal model is difficult, especially when there is a lack of expertise.

Excellent question +1. I'll offer a quick comment and say that the former method is strictly wrong however the risks are not all that bad in most settings. Maybe I can add an answer when I have a free moment. — Demetri Pananos, Nov 19 '20 at 03:26
@DemetriPananos Can you elaborate on strictly wrong? Like, it's "technically incorrect, but can still be useful and/or helpful?". To add more context - this came up at my workplace. I approached a problem where the task was to basically come up with data-driven recommendations to improve customer experience based on survey responses. I used a Bayesian ordinal regression model and included other observed variables in the data that I thought may confound the analysis - my superior used a full causal DAG and called my analysis "purely predictive". I never claimed causality (just correlations)... — aranglol, Nov 19 '20 at 04:29
...but I figured that the analysis would still be useful for making decisions. I would be highly appreciative of a full response! — aranglol, Nov 19 '20 at 04:30

score 1 · Accepted Answer · answered Nov 19 '20 at 04:54

Considering this came up in a work context, perhaps there was some confusion on what the goal of the analysis was.

If the goal is to identify who to market to people likely to click -- as in the case of uplift analysis or similar-- then a predictive model should be fine. In uplift, the goal is to target only those people who are likely to open the email/click the ad/whatever. The mechanism of why they clicked is irrelevant. You just want to know who is most likely to click and that is a prediction problem.

If, however, the goal is to take a customer who is unlikely to click intervene on them in such a way to cause them to click the ad, then a causal approach is needed. "Data-driven recommendations to improve customer experience " seems causal to me, at least in the way you're written it, so I'm willing to think this is the context we find ourselves in.

OK, but that doesn't answer the question. Why should we draw dags and do our causal analysis this way rather than just throw everything in a regression model? Richard McElreath gives some pretty compelling examples of why "Causal Salad" -- his pejorative name for throwing everything in a linear or machine learning model -- doesn't work. In chapters 5 & 6 of Statistical Rethinking, Richard gives several examples through simulation in which the true causal mechanism is poorly estimated when you don't draw the dag. I won't take the time to regurgitate those examples here as I wouldn't do them justice.

Suffice to say, you can very easily think your intervention is helpful when in reality it is hurtful if you don't take the time to draw your assumptions before your conclusions. So your approach is technically wrong, but the danger is presently unknown. For example, assume you estimated a positive treatment effect but in reality the effect was null. Nothing gained, nothing lost -- except money.

The distinction is still unclear to me. So in your first example, is marketing to those likely to click not an intervention in itself? Say I "market to those who are not likely to click". This to me seems like an intervention. — aranglol, Nov 19 '20 at 05:14
In the former, those customers were going to click anyway. I want to know who they are. In the latter, we want to turn someone who may not otherwise click into a clicking customer via an intervention (something that -- had it not existed -- would mean the customer would not click). The customer is getting the ad/email either way. The intervention could be a specific type of messaging or imaging or whatever — Demetri Pananos, Nov 19 '20 at 05:20
Or this example: my ML model consistently ranks those in urban areas as those who are likely to click. Equivalently, I market to the top say 300 individuals with highest likelihood of clicking, who all just happen to be in urban areas for the most part. I just can't reason the difference because whether the cause is urban areas or not does not seem relevant. Or perhaps thats the point. — aranglol, Nov 19 '20 at 05:26
Sorry, I didn't see your comment before I posted mine above. So it would be incorrect to market to individuals with low predicted likelihood of clicking? — aranglol, Nov 19 '20 at 05:28
Not incorrect per se, but in uplifit the goal is to market to the people we have the best chance of contacting with because we only have finite budget or impressions we can make. I highly suggest reading McElreath's chapters. Richard is a great writer and his examples are very good. — Demetri Pananos, Nov 19 '20 at 05:31
Okay, I think I understand more now. Marketing to those who aren't likely to respond is foolish because I don't know what I need to change with respect to my marketing to get these individuals to convert. That makes more sense now I think. — aranglol, Nov 19 '20 at 06:40

Carrying Out Interventions Based on ML "Feature Importances"

1 Answers1