Why should you use propensity score matching?

Question

I have a question about propensity score matching and how it is used in analyzing non-randomized datasets. I understand how it is performed (I think!), but not why I should use it and not something else. Let me illustrate what I mean with an example.

Assume I have a dataset with a treatment (x, binary variable), an outcome (y, numeric) and one covariate (z, numeric). The task is to determine the effect of the treatment (x) on the output (y). Treatment was however not randomly administered, so we need to take the covariate (z) into account. A real world example would be y=sick days per year, x=some drug and z=age.

If I understand correctly, this is how you (can) go about if you apply propensity score matching:

Predict treatment x from the covariate z, e.g. using logistic regression
Group patients based on this predicted probability of treatment = propensity score
For each propensity score group, calculate the average outcome (y) for patients with the treatment (x=1) and without the treatment (x=0)
For each propensity score group, calculate the difference between x=1 and x=0 averages above and then average all those differences (maybe using some weighting scheme)
The average difference is the treatment effect

Please let me know if I got this part wrong!

Another alternative, not involving propensity score at all, would be like this:

Predict the outcome (y) from the covariate AND the treatment (x), e.g. using logistic regression
Study the size, sign of the coefficient in front of x, and also check if it is statistically significant
Done!

Will the two methods give me the same result (probably not)? If not, which one is preferred and why? Does it differ if I have more than one treatment or covariate?

I'd be very happy if you could find the time to help me with this one. And please keep in mind that I'm just a simple engineer, so please limit the number of difficult statistical words and formulas :-)

The first approach sounds like *stratification on propensity score*, not *matching*. — Michael M, Feb 12 '21 at 12:25
See also [Why match if you have the control data already?](https://stats.stackexchange.com/questions/455567/why-match-if-you-have-the-control-data-already/455585#455585), [Does normal linear regression in R overcome confounding?](https://stats.stackexchange.com/questions/432700/does-normal-linear-regression-in-r-overcome-confounding). — Noah, Feb 13 '21 at 20:55

score 1 · Answer 1 · answered Feb 12 '21 at 13:34

Let's suppose we could simply observe the joint distribution of $\Pr(Y,X,Z)$. We don't have to estimate anything. We can calculate anything we want from this joint distribution.

Our assumptions about how the outcome is determined (a causal model) say the treatment effect, $E[Y_{X=1} - Y_{X=0}]$ isn't equal to $E[Y|X=1] - E[Y|X=0]$. There's a confounding variable, $Z$, that contaminates the association of $Y$ and $X$ so that it doesn't just reflect $X$ affecting the value of $Y$.

But it also says that $E[Y_{X=1} - Y_{X=0}|Z=z]$, the treatment effect for units where $Z$ equals $z$, equals $E[Y|X=1, Z=z] - E[Y|X=0, Z=z]$. If we look at units where $Z$ is held constant, the association between $X$ and $Y$ does come about because $X$ affects $Y$.

So, what we want to do is calculate $E[Y|X=1, Z=z] - E[Y|X=0, Z=z]$. One thing that is annoying is that the treatment effect can vary with $z$ now. We need to either consider many different treatment effects, one corresponding to every value taken by $Z$, or average them with $\Pr(Z)$.

Now let's return to reality. We can't observe $\Pr(Y,X,Z)$, we have to estimate it. And typically, we estimate the conditional expectations directly, with regressions. Your alternative does exactly that. The average marginal effect of your logistic regression is an estimate of $\sum_z (E[Y|X=1, Z=z] - E[Y|X=0, Z=z]) \Pr(Z=z)$ where $E[Y|X=x, Z=z]$ is modelled as $\frac{1}{1 + \exp{(-\alpha - \beta x - \gamma z)}}$ and $\Pr(Z=z)$ as the frequency with which $Z=z$ occurs in your sample.

Propensity scores matching is primarily a way of controlling for $Z$ (conditioning on it in the regression) when $Z$ has many components. It turns out that conditioning on the propensity score $p(Z)$, a scalar, is as good, for some purposes, as conditioning on $Z$ even when it's a vector. It's as good for rendering the potential outcomes $Y_{X=1}$ and $Y_{X=0}$ independent from $X$ when conditioning on it. This is why propensity scores work.

Long story short: Yes, the two approaches should give similar results And if $Z$ is one-dimensional, there is, as far as I know, no reason to use propensity scores. You can easily use $Z$ directly. Even if $Z$ has many components, you can just all add them to a regression and this is a still a more widely used control strategy than propensity scores.

Thanks. Not sure I got all the statistics behind it, but at least I understand that I won't be too wrong if I use a regression on both Z and X (which I understand) instead of propensity scores (which I still don't fully). — Anders, Feb 12 '21 at 14:13
If it's causality you want, the most important question is whether controlling for $Z$ is enough. Whether, once you've controlled for that, there is no other reason why $Y$ and $X$ might be correlated, except because $X$ affects $Y$. This has nothing to do with statistics, and everything to do with what you know about these variables, If yes, the regression on Z and X will generally give you a good idea of the effect of X. Even plain OLS, no logit. — CloseToC, Feb 12 '21 at 19:01

Why should you use propensity score matching?

1 Answers1