Conterfactual estimation in machine learning model

Question

There are various techniques to build counterfactual estimations of certain variables for linear models in observational studies.

Some of those are based on comparing the change in the predicted outcome when varying the exposure variable of interest on the whole dataset, keeping the other variables fixed (conditional effect).

Would this technique be valid also for non-linear machine learning models? Given a $$\hat{f} = E(f(x, C))$$ that is a non-linear function of the exposure x and covariates C that estimate the real data-generating process $$y = f(x, C)$$ would $$\Delta\hat{y} = \hat{y}_1 - \hat{y}_0 = \hat{f}(1, C) - \hat{f}(0, C)$$

be a good estimator of $$\Delta y = y_1 - y_0 = f(1, C) - f(0, C)$$ with $\hat{y}_1, \hat{y}_0, y_1, y_0$ being the predicted and real conterfactual outcome if the exposure was present or not?

(sorry if the notation was incorrect, feel free to edit)

Landon Gibson · Answer 1 · 2019-02-08T20:49:11.293

0

Yes, assuming you have met your causal identification assumptions for the ATE, an ML model where you predicted the outcome for each individual under different values of treatment status and then average it would give you back your ATE. Once the causal identification assumptions are met the problem of counterfactual estimation simply becomes an imputation problem (Xu 2017). The cited paper is an example of where the language of imputation is used to discuss counterfactual estimation.

Xu, Y. (2017). Generalized synthetic control method: Causal inference with interactive fixed effects models. Political Analysis, 25(1), 57-76.

edited Feb 08 '19 at 20:49

answered Feb 03 '19 at 02:23

Landon Gibson

131
5

hello, thanks for the answer. My hypothesis is that "given the covariates" (C) that I measured because I thought they may be important for the outcome, I could estimate the distribution of y in my dataset changing only the predictor x, leaving the rest of the C fixed. The difference with normal linear model is that the relationship between x, C and y is free to be non-linear. So I missed your point, the variable identification problem is common to every observational study isn't it? Why do you think it could be particularly a problem with this method? – Bakaburg Feb 04 '19 at 17:23
if I were using a linear model I would have simply used the beta coefficient of x as ATE, but being the model non-linear (and having thousands of data points) I would estimate ATE using the actual difference in y under two scenarios. My question is whether this estimation methodology is sound, given that I gathered a number of covariates that leaves most confounding out (actually, a simple linear model of a subgroup of those variables has an AUC of ≈ .8, so quite good) – Bakaburg Feb 04 '19 at 17:28
Hi there, sorry for the confusion! The answer to your question is that yes an ML model where you predicted the outcome for each individual under different values of treatment status and then average it would give you back your ATE. This assumes that you have controlled for confounders correctly such that treatment status is independent of the potential outcomes. – Landon Gibson Feb 05 '19 at 17:44
Ok thanks, if you edit your answer, motivating (and even better adding some reference) I can sign the question as solved! – Bakaburg Feb 06 '19 at 11:25

Conterfactual estimation in machine learning model

1 Answers1