2

I'm running comparisons of different counterfactual modeling methodologies (exact matching, propensity score matching, regression, etc.) on simulated data in order to see which methods produce the most precise estimates of the "true" population treatment effects.

This works great for the Average Treatment Effect (ATE) - you can directly compute the expected ATE from the data generating process in the following R code:

### Simulation data from "Targeted Maximum Likelihood Estimation for Causal Inference in  ###
### Observational Studies", Schuler & Rose, 2016                                          ###
x1 <- rbinom(n=10000, size=1, prob=0.55)
x2 <- rbinom(n=10000, size=1, prob=0.3)
# x3 <- rbinom(n=10000, size=1, prob=0.1)

# Binary treatment variable
A <- rbinom(n=10000, size=1, prob=exp(-.5 + .75*x1 + x2)/(1 + exp(-.5 + .75*x1 + x2)))

# Continuous confounder variable
Z <- rnorm(n=10000, mean=100, sd=10)

# Outcome variable
Y <- rnorm(n=10000, mean=24 - 3*A + 3*x1 - 4*x2 + 7*x1*x2 + 5*A*x1 - 10*A*x2 + 15*A*x1*x2, sd=4.5)
# Expected ATE for Y = E(Y|A=1) - E(Y|A=0)
#                    = .45*.70*(-3) + .55*.70*(-3 + 5) + .45*.30*(-3 - 10) + .55*.30*(-3 + 5 - 10 + 15)
#                    = -0.775

However, many techniques find the Average Treatment Effect on the Treated (ATT), not the ATE. How would you find the expected ATT using the same data generating process formulas in the above example?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
RobertF
  • 4,380
  • 6
  • 29
  • 46

1 Answers1

2

Sadly, there is no closed-form solution for the ATT except in certain cases. The formula for the ATE is the combined coefficient on the A when evaluating the predictors at their means, i.e.,

-3 + 5*.55 - 10*.3 + 15*(.55*.3) 

which does equal -.775 as you have figured out. (Note that the final term should include the mean of x1*x2, which in this case happens to be the product of the means of x1 and x2, but won't always). To find the ATT, you would need to evaluate the combined coefficient on A at the predictor means in the treated group. The unfortunate part is that there is in general no clear way to find the predictor means in the treated group analytically.

You can use simulations to estimate the ATT given that you know the parameters of the outcome-genertaing process: in a large dataset, compute the predictor means at A==1 (i.e., mean(x1[A==1]), mean(x2[A==1]), and mean((x1*x2)[A==1])), and then plug those into the formula, i.e.,

-3 + 5*mean(x1[A==1]) - 10*mean(x2[A==1]) + 15*mean((x1*x2)[A==1]) 

That should produce an estimate close to the true ATT. You can do this many times and average them to get even closer to the true ATT. When I did this, I got approximately -.21 for the ATT.

Noah
  • 20,638
  • 2
  • 20
  • 58
  • Thanks Noah! This is good to know. I was speculating the E(ATT) could be calculated by multiplying each of the four terms in my E(ATE) formula example by P(A=1) using the logistic regression data generating function. Would that not work? – RobertF Apr 17 '21 at 19:51
  • BTW while on the topic of ATTs, is it possible to estimate the ATT, not the ATE, using Targeted Maximum Likelihood Estimation by substituting the inverse weights for ATT in the ATE clever covariate equation? So change H(A,W) = I(A=1)/p_hat – I(A=0)/(1 – p_hat) to H(A,W) = I(A=1)*1/1 – I(A=0)*p_hat/(1 – p_hat). – RobertF Apr 17 '21 at 20:08
  • 1
    Your method assumes ATT = P(A=1)*ATE, which is not true. I don't know enough about TMLE to answer that, but it would make a good separate question. – Noah Apr 18 '21 at 03:04
  • Posted a question about the TMLE ATT in Cross Validated. Also the appendix of Rose & Van der Laan's book on Targeted Learning includes the derivation of the clever covariate for ATE so might help with finding the calculation for the ATT. – RobertF Apr 20 '21 at 00:41
  • Thinking some more about the possibility of a closed-form solution for the ATT. If the predictor variables in the data generating regression formula are all binary, then `mean(x1[A==1])` is equivalent to E(x1|A=1) = P(x1=1|A=1) = P(A=1|x1=1)P(x1=1)/P(A=1) according to Bayes Rule, which I think has an analytical solution. If x1 is continuous then the conditional expectation E(x1|A=1) is more difficult to evaluate. – RobertF May 03 '21 at 04:21
  • I thought so too and even tried to figure it out myself but couldn't. When I used Bayes' rule I got the wrong answer each time. If you figure it out I'd love to see it. – Noah May 03 '21 at 07:52
  • Keeping things simple, using only the x1 predictor: `x1 – RobertF May 03 '21 at 14:58
  • May have gotten a close value by coincidence. Need to see if calculations hold up when x2 predictor is added to the DGF regressions. – RobertF May 03 '21 at 15:02
  • For two predictors, the calculation is the same except P(A=1|x1=1) is averaged across values of the second variable x2: P(A=1|x1=1) = P(A=1|x1=1, x2=1)P(x2=1) + P(A=1|x1=1, x2=0)P(x2=0) = 0.6267135. Plugging this into the Bayes Rule formula we get P(x1=1|A=1) = 0.6233971. This is pretty close to `mean(df$A[df$x1==1])` = 0.6210179 from the simulated data. So the analytical solution appears to be working, however the number of terms in the formula grows according to 2^((k-1) for k # of predictors, making bootstrap estimates a more viable alternative for large values of k. – RobertF May 05 '21 at 02:23