Propensity Score Matching – How do the mechanics lead to a different result than unmatched?

Question

The gist of propensity score matching, as I understand it, is as follows:

You want to estimate the average treatment effect (ATE) of a treatment on some outcome. However, if you simply calculate the difference between the average outcome of the treated and untreated groups, this may be a biased estimate of ATE if factors that influence the outcome variable also influence the probability of receiving the treatment in the first place.

Propensity score matching minimizes this problem by matching treated and untreated observations with similar probabilities of receiving treatment (via logistic regression of treatment status on covariates), and then estimates ATE as the average difference in outcomes among the matched pairs.

So far, so good? This sounds fine conceptually, but where I have trouble is in seeing how the actual mechanics lead to different outcomes for matched as opposed to naive ATE estimation.

To illustrate:

Suppose four individuals, $X_a, X_b, Y_a, Y_b$, where $X$ indicates that the person did not receive the treatment, $Y$ indicates that the person did receive the treatment, the $a$s have similar covariate values to each other, and the $b$s have similar covariate values to each other.

And suppose $F(^*)$ denotes the outcome for which you are attempting to estimate the effect of treatment.

You first estimate ATE naively, looking at the simple difference in the the average outcome of the treated and the average outcome of the untreated.

Naive ATE estimate: $\frac{F(Y_a)+F(Y_b)}2 - \frac{F(X_a)+F(X_b)}2$

Next, you estimate ATE by first matching on propensity score. As mentioned, the subscript indexing each individual reflects covariate values, and so after we run the logistic regression (ignoring sample size issues), we find that $X_a$ and $Y_a$ have similar propensity scores to each other, while $X_b$ and $Y_b$ have similar propensity scores to each other. We proceed to look at the average difference among these matched pairs.

Matched ATE estimate: $\{[F(Y_b)-F(X_b)] + [F(Y_a)-F(X_a)]\}/2$

The problem is that both the naive ATE estimate and the matched ATE estimate are mathematically equivalent!

Now I'm sure I've made a mistake in my formulation of the matched ATE estimate. My question is, where did I go wrong?

P.S: I am aware that propensity score matching can also be used to drop observations that don't have suitable matches, but I want to ignore that case because my understanding of propensity score matching is that it should lead to a different estimate than a naive estimation even if all observations are matched.

Short answer is both methods give the same estimate of treatment effect if the sample size in treated and control groups is the same and you match all observations one to one. — jsk, May 18 '14 at 00:19
Thanks @jsk . I must say though that I am surprised by this. While it's nice to know that I did not make a mistake in my example, I'm just surprised that I've never seen this caveat mentioned in any explanation of propensity score matching I've read, as it seems like an important caveat to note. — Yakkanomica, May 18 '14 at 00:38
@Yakkanomica: What would the caveat warn of? No snares here that I can see. And note that the variance of the ATE estimate will differ if you take the pairing into account in the analysis (not everyone does). — Scortchi - Reinstate Monica, May 18 '14 at 01:02
@Scortchi, the caveat would warn precisely jsk's comment: that if you don't drop any observations and match one-to-one, that you will get the same ATE estimate as you would from a naive estimate that makes no attempt to match. Perhaps this is obvious to others, or perhaps it just never comes up in practice, but to me this was certainly a surprise to learn. That is an interesting point about the variance. — Yakkanomica, May 18 '14 at 01:10
Caveats warn of restrictions on the applicability or interpretation of analyses; which is why I say this observation doesn't necessitate any. Perhaps it doesn't come up in explanations because propensity score matching is seen primarily as a way to select a control group & secondarily (if at all) as a way to reduce the variance of ATE estimates by pairing like observations. — Scortchi - Reinstate Monica, May 18 '14 at 01:25
@Scortchi I would consider this a restriction on the interpretation of analyses, as one might falsely assume they have reduced bias in their ATE estimate relative to a naive estimator. But maybe others would not assume that. In any event I only meant "caveat" in the dictionary sense of a warning; I was not aware of the more specific statistical definition. — Yakkanomica, May 18 '14 at 01:35
"Watch out for that snake!" is a warning, but not a caveat. In "You needn't worry about being bitten, as long as you're wearing your boots" there's a caveat, defined by whatever dictionary Google uses as *a warning or proviso of specific stipulations, conditions, or limitations*. I say this not purely out of pedantry but because statistical caveats - "as long as the variances of each group are equal" - are of a kind with non-statistical ones. — Scortchi - Reinstate Monica, Jun 02 '14 at 14:58
There are many problems with propensity score matching as detailed in [BBR](http://hbiostat.org/doc/bbr.pdf). Chief among them is the exclusion of valid observations, reducing the sample size and hence the power. — Frank Harrell, Sep 03 '19 at 12:09

Propensity Score Matching – How do the mechanics lead to a different result than unmatched?

0 Answers0

Linked