Why and when would you use Regression Discontuinity (RD) instead of a linear model with a dummy?

Question

More specifically, why can't you just run an OLS with a dummy for the treatment group, instead of a RD. What complications might arise when you do so? And as an extension of that, what criteria's do you have to look at when deciding whether to do run an OLS or a RD?

dimitriy · Answer 1 · 2021-12-10T21:30:01.277

The issue is that treated and control groups are not comparable because treatment is not random, but based on a cutoff rule. Observations around the cutoff are more comparable, so the variation near that point is more plausibly experiment-like.

For example, take COVID vaccine effectiveness. Now suppose people over 65 years of age are eligible to be jabbed, but the younger folks are not. To make the problem simpler, let's assume that everyone wants the vaccine if eligible. If you just compare the mortality of vaccinated old people and unvaccinated young people, the effect of the vaccine is conflated with age plus a whole bunch of other stuff that's different between the old and young, like health. This is the comparison that the regression of mortality on a vax dummy corresponds to.

But if you focus on folks who are 64 years and 364 days old and compare them with those who are 65 years and 1 day old, that is more an apples-to-apples comparison since these groups are likely to be similar on all other dimensions on average and the only remaining difference is vaccinations status. This helps you isolate the effect of the vaccine, though only for those in the 64-65 age group. RD accomplishes this by weighting the folks around the cutoff more than those further away in various ways.

You would use vanilla OLS if treatment was random or you had a great model of how the treatment is assigned. You would do RD if there was some sort of cutoff. But RD works if there is nothing else that alters mortality around that age, which is not the case for the US since you become eligible for Medicare (government-sponsored health insurance) at 65. It will also fail if people can falsify their age, and that is correlated with health. For example, if sicker people are more likely to take the risk of getting caught, that will bias the treatment group mortality since its composition is now less healthy. There are also other flavors of RD (fuzzy, spatial, kinky) that I won't get into.

That actually makes a lot of sense, thanks a lot! Just for clarification. Let's say you would do an OLS regression on the 64-65 age group, with regards to the effectiveness of the vaccine instead of a RD. Then this wouldn't weight ages further from 65 less then those closer to 65. So basically this OLS test would then just have smaller statistical power. Resulting in a higher chance of biased estimators and bigger standard errors? — Brockenspook, Dec 10 '21 at 23:43
The regression you propose gives everyone within a year of 65 a weight of one, and everyone else a weight of zero. So it’s basically RD with a rectangular kernel and a bandwidth of one year. You can probably do better with something fancier. — dimitriy, Dec 10 '21 at 23:59

Why and when would you use Regression Discontuinity (RD) instead of a linear model with a dummy?

1 Answers1