I don't really understand the difference between the strict exogeneity assumption in OLS and the strict exogeneity assumption in DiD (difference-in-differences). If they are the same, then what is the advantage of using DiD over plain OLS? After all, if DiD requires OLS assumptions + parallel trends, then it's actually harder to identify casuality under DiD than OLS.
-
For clarification - what is DiD? (I deduced it, but others might not, or might just give up when they see an unknown acronym.) – jbowman Mar 20 '20 at 21:07
-
The plausibility of exogeneity depends on what's in the model. Exo in DiD if a unit with a similar pre-event trend in the outcome is available is a lot more plausible than exo in OLS. It's similar to how exogeneity is more plausible conditional on important controls than without. The observed development in the target unit combines the effect of the treatment/reform and a trend that would've occurred anyways. DiD controls for the trend. – CloseToC Mar 20 '20 at 21:28
-
What do you mean when you say "the advantage of difference-in-differences over plain OLS"? We typically use OLS to estimate these types of equations. Unless you want to forgo modeling a time component and only estimate differences *across* units. Could you provide further clarity? – Thomas Bilach Mar 21 '20 at 18:54
-
@jbowman edited for clarity – pythonuser Mar 22 '20 at 20:27
-
@CloseToC thanks - are you saying that: given that the treatment and control units have similar pre-event trends, it is more likely that the strict exogeneity assumption in fulfilled in DiD than in OLS? – pythonuser Mar 22 '20 at 20:27
-
@Tom I understand that DiD is often estimated using OLS, but it is my impression that DiD provides more rigorous/plausible causal estimates than OLS provided that the parallel trends assumptions holds. I have recently become aware that DiD also has to fulfill strict exogeneity, like in plain OLS, so I'm a bit confused about the utility of DiD for identification. For example, putting questions of sample size aside, what is the advantage of DiD using pre-treatment and post-treatment period data vs OLS using only post-treatment period data? – pythonuser Mar 22 '20 at 20:29
-
There are many advantages to using pre-event data. Do you have data on only *one* (or few) post-treatment periods? Or, would you like to know the advantages/disadvantages in general? – Thomas Bilach Mar 25 '20 at 15:22
-
@Tom Hi Tom, I'm interested in the advantages/disadvantages in general. – pythonuser Mar 25 '20 at 18:05
1 Answers
I want to address each one of your statements to ensure you are not confusing terminology.
I don't really understand the difference between the strict exogeneity assumption in OLS and the strict exogeneity assumption in DiD.
We do not make any less assumptions when we use ordinary least squares (OLS) to estimate a difference-in-differences (DD) equation.
If they are the same, then what is the advantage of using DiD over plain OLS?
We do not use DD over OLS. DD is a methodological framework assessing the relative outcomes of two groups across time. We typically use OLS (and other estimation methods) to estimate a DD equation. The power of the DD method lies in our ability to observe a counterfactual trend in the outcome for non-adopters/non-receivers of some policy/intervention.
Unless you want to eliminate the time component in a DD specification and run a model comparing outcomes across cross-sectional units, I am not sure what you mean when you compare “DD over plain OLS.” Based upon your comments, it appears you want to estimate a model using only post-treatment observations. You could do this, but this (alone), in my estimation, is a less powerful approach at identifying your treatment effect.
It should be noted that if you discard all of your pre-event data, you can’t even do a DD analysis. A “posttest” only evaluation would further reinforce issues concerning selection of units into the treatment. You also remove any chance of assessing the change in trend in the treated condition (i.e., the 'pre-post' change due to the implementation of the policy) relative to the control group (counterfactual trend). Remember: DD performs a double-difference across units and across time.
Note again why DD offers improvements over a "posttest" only evaluation. DD methods allow for nonrandom selection into treatment; in particular, it allows for some selection on the basis of unobserved, time-invariant characteristics. In other words, selection into treatment may be confounded, so long as it is not time-varying. In general, exogeneity may be violated if you omit important time-varying confounders that may also affect your outcome. At the very least, it is your job to demonstrate to your audience that the treatment is plausibly unconfounded. This previous post addresses endogeneity concerns and may be of some interest to you.
- Cautionary Note: You should avoid controls that are themselves affected by (or, are outcomes of) treatment. See Andy's response here for a discussion of 'bad' controls.
After all, if DiD requires OLS assumptions + parallel trends, then it's actually harder to identify causality under DiD than OLS.
Yes. You correctly note that if we are estimating a DD equation using OLS, the ordinary assumptions must hold. The strict exogeneity assumption can be stated in terms of a zero conditional expectation; this is akin to the assumption we make in a cross-sectional case. The only difference is now $y_{it}$ and $X_{it}$ are $t$-subscripted, which gives us another dimension to play with (and worry about too).
Suppose we are interested in the causal effect of a policy implemented in select jurisdictions throughout the country. In practice, we typically delineate a policy variable, $X_{it}$, where $i$ indexes units (e.g., counties) and $t$ indexes time (e.g., years). The assumption of strict exogeneity may be violated due to the omission important time-varying unobservables, $u_{it}$, that are correlated with both $X_{it}$ and the outcome $y_{it}$.
Most panel data estimators rely on some form of a strict exogeneity assumption. You may see it expressed in texts as $\textrm{E}[X_{it}u_{is}] = 0$ for all $s$ and $t$. In words, the explanatory variable(s) is uncorrelated with the idiosyncratic error in each time period. Note, we also assume that the errors are orthogonal to all leads and lags of $X_{it}$. If you want to learn more about when exogeneity may fail in panel data contexts, then review the examples in these slides. Also, see page 19 of these lecture notes for an example of a mild form of non-exogeneity. I acknowledge that this is a very restrictive assumption in some DD applications. However, the assumption is more plausible when there is a parallelism in the group trends.
Demonstrating the plausibility of un-confoundedness is often achieved visually with a plot of the group trends before policy adoption. Trend equivalence is often implicitly assumed. The group averages seldom move precisely in tandem over time. This is why we typically start with an inspection of the group trends. Assessing pre-trends helps us understand the magnitude of any possible confounding. It is your job to determine how much of the observed effect is due to the actual policy, and how much is actually due to other confounders. DD methods are powerful because the implementation of treatment is, typically, outside the control of the units of observation. Nature (or some exogenous event) does some of the randomization for us. If so, and the treatment is exogenous (rare), then time-varying confounders shouldn't matter much. This is why we exploit these events and why DD methods have become so powerful!
In the real world, however, other factors may change over time and may also affect your outcome. For instance, real gross domestic product may increase/decrease within states over time; labor demand may ebb and flow periodically; prices may fluctuate on a quarterly basis; population size (or its composition) may change slowly across years. This is all context-specific; it depends on what policy is under evaluation.
You may also find this book chapter helpful.

- 4,732
- 2
- 6
- 25