Why is Average Treatment Effect different from Average Treatment effect on the Treated?

Question

In RCTs, randomisation balances unmeasured confounders and, I'm told, ATE and ATT would be the same. In observational studies, this is not possible and Propensity Scores are used in various ways to estimate ATT and/or ATE. The analyses that I've performed and examples that I've seen (eg this helpful text) shows different ATT and ATE (albeit slightly).

Please can anyone help me understand why they are different and, more importantly, what the differences mean (eg. if ATE>ATT or ATT>ATE), if anything?

score 42 · Accepted Answer · answered Oct 17 '17 at 23:29

The Average Treatment Effect (ATE) and the Average Treatment Effect on Treated (ATT) are commonly defined across the different groups of individuals. In addition, ATE and ATT are often different because they might measure outcomes ($Y$) that are not affected from the treatment $D$ in the same manner.

First, some additional notation:

$Y^0$: population-level random variable for outcome $Y$ in control state.
$Y^1$: population-level random variable for outcome $Y$ in treatment state.
$\delta$: individual-level causal effect of the treatment.
$\pi$: proportion of population that takes treatment.

Given the above, the ATT is defined as: $\mathrm{E}[\delta|D=1]$ ie. what is the expected causal effect of the treatment for individuals in the treatment group. This can be decomposed more meaningfully as: \begin{align} \mathrm{E}[\delta|D=1] = & \mathrm{E}[Y^1 - Y^0|D=1] \\ & \mathrm{E}[Y^1|D=1] - \mathrm{E}[Y^0|D=1] \end{align}

(Notice that $\mathrm{E}[Y^0|D=1]$ is unobserved so it refers to a counterfactual variable which is not realised in our observed sample.) Similarly the ATE is defined as: $\mathrm{E}[\delta]$, ie. what is the expected causal effect of the treatment across all individuals in the population. Again we can decompose this more meaningfully as: \begin{align} \mathrm{E}[\delta] =& \{ \pi \mathrm{E}[Y^1|D=1] + (1-\pi) \mathrm{E}[Y^1|D=0] \} \\ -& \{ \pi \mathrm{E}[Y^0|D=1] + (1-\pi) \mathrm{E}[Y^0|D=0] \} \end{align}

As you see the ATT and the more general ATE are referring by definition to different portions of the population of interest. More importantly, in the ideal scenario of a randomised control trial (RCT) ATE equals ATT because we assume that:

$\mathrm{E}[Y^0|D=1] = \mathrm{E}[Y^0|D=0]$ and
$\mathrm{E}[Y^1|D=1] = \mathrm{E}[Y^1|D=0]$,

ie. we have believe respectively that:

the baseline of the treatment group equals the baseline of the control group (layman terms: people in the treatment group would do as bad as the control group if they were not treated) and
the treatment effect on the treated group equals the treatment effect on the control group (layman terms: people in the control group would do as good as the treatment group if they were treated).

These are very strong assumptions which are commonly violated in observational studies and therefore the ATT and the ATE are not expected to be equal. (Notice that if only the baselines are equal, you can still get an ATT through simple differences: $\mathrm{E}[Y^1|D=1] - \mathrm{E}[Y^0|D=0]$.)

Especially in the cases where the individuals self-select to enter the treatment group or not (eg. an e-shop providing cash bonus where a customer can redeem a bonus coupon for $X$ amount given she shops items worth at least $Y$ amount) the baselines as well as the treatment effects can be different (eg. repeat buyers are more likely to redeem such a bonus, low-value customers might find the threshold $Y$ unrealistically high or high-value customers might be indifferent to the bonus amount $X$ - this also relates to SUTVA). In scenarios like this even talking about ATE is probably ill-defined (eg. it is unrealistic to expect that all the customers of an e-shop will ever shop items worth $Y$).

ATT being unequal to ATE is not unexpected. If ATT is smaller or greater than ATE is application specific. The inequality of the two suggests that the treatment assignment mechanism was potentially not random. In general, in an observational study because the above-mentioned assumptions do not generally hold, we either partition our sample accordingly or we control for difference through "regression-like" techniques.

For a more detailed but easy to follow exposition of the matter I recommend looking into Morgan & Winship's Counterfactuals and Causal Inference.

Thank you very much for this incredibly detailed and helpful answer. I'm not a statistician and struggle at times with formulae, but this is very clear. Will Morga & Winship's book be digestible by a layman, or can you suggest a "dummy's guide" to causal inference? Thanks again — bobmcpop, Oct 25 '17 at 02:30
I am glad I could help. I have educated myself mostly from papers so I have a limited view of what causal inference books are out there. That said, I have found M&W's book to be clear and easy to comprehend; I think an inclined layman will have little problems to follow through. The book is part of the "*Analytical Methods for Social Research*" series from Cambridge Univ. Press so it uses mostly Sociology-based examples. @DimitriyV.Masterov might have a more educated suggestion. — usεr11852, Oct 25 '17 at 23:42
Thanks, I'll get myself a copy. "The inequality of the two suggests that the treatment assignment mechanism was potentially not random." I assume in a hypothetical situation where literally every baseline confounder was measured in an observational study, and there was a perfect match for each PS, we would get very close to those assumptions. Therefore would the extent to which ATT/ATE are discordant provide any meaningful information about how poorly the PS balanced for unmeasured confounders? — bobmcpop, Nov 07 '17 at 14:24
In a *hypothetical* situation, yes. I think it would be meaningful in the context of a simulation study. That said, actual using it to quantify "poorness/goodness" of balance achieved by PS is probably a methodical exercise on its own right. (Happy reading!) — usεr11852, Nov 07 '17 at 20:56
@bobmcpop I'm a statistician, and I never have understood why someone would want to measure the ATT instead of the ATE. It's important to have a control group to account for regression to the mean effects & other factors - you're missing this when you use the ATT. — RobertF, Oct 18 '18 at 20:42
This answer would be even better if you could spell out the steps & formulas that are used for calculating the ATT. Say you've matched the treatments with the controls on propensity scores. What's the next step? — RobertF, Oct 18 '18 at 20:51
@RobertF: ATT can be relevant especially in cases where we have an inherit selection bias in our sample. For example, we might want to measure the effect on those who would "typically"/"be likely to" take up a treatment (think the bonus conditional on some expenditure example I mention) instead of across all people who could potentially take up the treatment. — usεr11852, Oct 18 '18 at 20:54
@RobertF: You are welcome to open a new question regarding the use of propensity scores for the calculation of ATE and ATT. — usεr11852, Oct 18 '18 at 20:57
I asked a very similar question on this site: https://stats.stackexchange.com/questions/238431/is-the-average-treatment-effect-on-the-treated-att-a-meaningful-comparison-in. I didn't get an answer, but in the comments was told that the ATT is estimated as the average difference between trmt and controls with a 1:1 match. — RobertF, Oct 18 '18 at 21:15
I think that ATE, ATT, ATC, etc. are all defined in terms of our research question; please see my comment from about 60' ago. As you correctly mention, this is a potential estimation method and not the actual meaning of estimator. — usεr11852, Oct 18 '18 at 21:57
@usεr11852 I asked my colleagues in my workplace (a health insurance company) about their choice of estimating ATT over ATE for a case vs. control propensity score matching study. They have more of an operational rather than mathematical definition of ATT - out of a pool of candidates eligible to receive a treatment, only a small percentage received the treatment during a given time period. The rest of the eligible population are considered to be an ad hoc "control group" of sorts even though they're scheduled to receive the treatment. Hence ATT rather than ATE, but the difference is semantic. — RobertF, Oct 25 '18 at 18:23
@RobertF: The idea of "ad hoc control" seems dangerous to me because of selection bias. (I work in industry too.) — usεr11852, Oct 26 '18 at 20:51

Yao Zhao · Answer 2 · 2020-05-14T18:33:00.160

7

ATE is the average treatment effect, and ATT is the average treatment effect on the treated.

The ATT is the effect of the treatment actually applied. Medical studies typically use the ATT as the designated quantity of interest because they often only care about the causal effect of drugs for patients that receive or would receive the drugs.

For another example, ATT tells us how much the typical soldier gained or lost as a consequence of military service, while ATE tells us how much the typical applicant to the military gained or lost.

edited May 14 '20 at 18:33

answered Mar 21 '20 at 00:09

Yao Zhao

117
1
4

You're confusing the ATT with the ITT, intent-to-treat effect. – Noah Mar 21 '20 at 03:43
1

I don't confuse. I refer to this paper: Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political analysis, 15(3), 199-236. – Yao Zhao Mar 21 '20 at 15:51
1

You're right, I misunderstood. – Noah Mar 21 '20 at 18:20
1

This is one of the clearest explanations I have seen of ATE vs ATT – asdfkjasdf33 May 14 '20 at 14:29

Why is Average Treatment Effect different from Average Treatment effect on the Treated?

2 Answers2

Linked