8

Over the past 15 years there has been progress in adapting machine learning methods for causal inference. For example: targeted learning, double machine learning, causal trees.

Is there a textbook that covers the current range of techniques? I haven't seen anything on Amazon, perhaps there are texts available on other sites? Or will be published soon?

RobertF
  • 4,380
  • 6
  • 29
  • 46
  • ML doesn’t come to mind when thinking of causal inference – Aksakal Oct 19 '21 at 21:12
  • 3
    @Aksakal Agreed, but lately there's been some work in this area. ML algorithms have some advantages over traditional parametric models when estimating average treatment effects for complex, high dimensional data. The tricky part is finding unbiased estimates plus confidence intervals. – RobertF Oct 20 '21 at 02:32
  • Your question is about recommended books for ML and causal inference. Useful suggestions was already given. About causal inference and non recommended econometrics books read here https://stats.stackexchange.com/questions/477705/how-would-econometricians-answer-the-objections-and-recommendations-raised-by-ch/501338#501338 – markowitz Nov 14 '21 at 12:02

3 Answers3

7

I follow this area pretty closely, but I think this subfield is so new no textbook exists (yet).

However, there are some course videos that are fairly good:

  1. Machine Learning & Causal Inference: A Short Course at Stanford (accompanying tutorial)
  2. Summer Institute in Machine Learning in Economics (MLESI21) at University of Chicago

There is also a nice survey paper: "Machine learning methods that economists should know about" by Susan Athey, Guido Imbens in the Annual Review of Economics (link to draft)

dimitriy
  • 31,081
  • 5
  • 63
  • 138
  • 1
    Thanks dimitriy. Susan Athey's name comes up quite a bit when I google "machine learning causal inference" – RobertF Oct 20 '21 at 02:34
1

As dimitriy states, there isn't a singular textbook yet (or at least that I am aware of). However, there are a few textbook materials you can piece together to cover the topics you mentioned.

  1. Targeted Learning in Data Science covers super learner (which is a generalized stacking algorithm you would almost always want to use in practice), and targeted maximum likelihood estimation (with a bunch of variations of it). I think this one will be preferred over the other targeted learning book since the one linked above covers the machine learning parts a bit better
  2. Chapter 18 of Hernan and Robins covers double machine learning.

Unfortunately, I don't have a recommendation for causal trees

pzivich
  • 1,430
  • 1
  • 5
  • 15
  • Beware that the super learning can be dominated by a single "machine" that is grossly overfitted, making the super learner ensemble highly overfitted. – Frank Harrell Oct 20 '21 at 12:30
  • Do you have a reference that demonstrates that? My impression is that the cross-validation decreases that (more so than alternative approaches) – pzivich Oct 20 '21 at 12:37
  • I just have an example analysis on a 40,000 patient clinical trial. Cross-validation reveals but doesn't fix that. The super learner was fooled. One of the learners was a random forest-like method that was overfitted to a degree I've never seen before. – Frank Harrell Oct 20 '21 at 12:42
  • @FrankHarrell A good topic for another question, I'd like to see this study. I'm curious if it's possible, even with cross-validation, to overfit a model to given training & test datasets? – RobertF Oct 20 '21 at 16:10
  • Thank you for the references - I've read part of Hernan & Robins and also Van der Laan and Rose's book. I understood the theory behind the ensemble super learner used for TMLE, but the theory for calculating the ATE gets pretty dense, involving influence curves & functional derivatives. At my last job interview with a health insurance company the interviewer mentioned they were using double machine learning, and it pops up in google searches - maybe DML is easier to understand and implement? – RobertF Oct 20 '21 at 16:24
  • Double Machine Learning also relies on influence curves (to derive the AIPW estimator). I don't know if I would say it is any easier to understand (implementation also depends). There are various libraries across software to implement either. A shameless plug, but I do have a more intro paper on double cross-fitting (a variation on DML) https://pubmed.ncbi.nlm.nih.gov/33591058/ – pzivich Oct 20 '21 at 16:59
  • Thank you - do you have a pdf link to your paper? – RobertF Oct 20 '21 at 20:04
  • @RobertF the pre-print version is available here https://arxiv.org/abs/2004.10337 – pzivich Oct 21 '21 at 13:39
  • Fantastic, thank you! Regarding Double Machine Learning - my understanding is that in DML you fit an ML model on outcome $Y$ and confounders $W$ to predict $\hat{Y}$ and a 2nd ML model on treatment $T$ and confounders $W$ to predict $\hat{T}$. Then fit a 3rd model (can be simple regression) on $Y - \hat{Y}$ and $T - \hat{T}$ to obtain an unbiased estimate of ATE that controls for confounders. No need to get into influence curves & functional derivatives. Is this correct or am I oversimplifying? – RobertF Oct 21 '21 at 13:55
  • 1
    No, DML requires sample splitting. Chernozhukov et al. (2017) estimates the ATE using the augmented inverse probability weighting (AIPW) estimator with single cross-fitting. See https://stats.stackexchange.com/a/482498/247479 The 3rd model doesn't happen (that's almost like TMLE, but not quite). But influence curves are underlying all of it since they are used to show that the estimator is semiparametric efficient, and the variance (particularly with DML) is estimated from the influence curves. For how AIPW works, I would read Funk et al. (2011) "Doubly Robust Estimation of Causal Effects". – pzivich Oct 21 '21 at 14:53
  • Understood, thank you! – RobertF Oct 21 '21 at 16:07
1

For most recent work have a look at the conference for Causal Learning and Reasoning (CLeaR) 2022.

If you want to get started with ML and causal inference, I particular recommend (disclaimer: I m one of the co-authors) to look at Kelly, Kong, Goerg (2022) on "Predictive State Propensity Subclassification (PSPS): A causal inference algorithm for data-driven propensity score stratification". It's a fully probabilistic framework for causal inference by learning causal representations in the predictive state space for Pr(outcome | treatment, features). See paper for details.

For a ready to go TensorFlow keras implementation see https://github.com/gmgeorg/pypsps with code examples and notebook case studies.

Georg M. Goerg
  • 2,364
  • 20
  • 21