2

Consider the model $y_{it} = \alpha_i + \beta_{it}did_{it} + \gamma_{it} + \phi_i + \zeta_t + \varepsilon_{it}$

for group $i$ and year $t$. $\phi_i$ refers to group fixed effects and $\zeta_t$ to year fixed effects.

I am interested in the effect of the difference-in-differences variable $did_{it}$ on outcome $y_{it}$. However, I find that covariate $\gamma_{it}$ also fulfills the difference-in-differences assumption of parallel trends in the pre-treatment period, and that the effect of $did_{it}$ on $\gamma_{it}$ is significant.

Question 1: There is a plausible story for why $did_{it}$ might have a significant effect on $\gamma_{it}$, but it is not part of my original hypothesis. Should I report this result?

Question 2: Should I consider $\gamma_{it}$ as a 'bad covariate' (something that is part of the effect of $did_{it}$ on $y_{it}$ rather than a genuine control) and eliminate it from the regression model?

pythonuser
  • 103
  • 11
  • 1
    Welcome back! Your $\gamma X_{it}$ should represent *time-varying* covariates. A couple of follow-up questions. Did you regress one of these variables on your main treatment indicator? What do you mean when you speak of the effect of the treatment indicator, $did_{it}$, on this covariate(s)? – Thomas Bilach Mar 16 '20 at 11:38
  • 1
    Small note with regard to notation, you could drop the $i$ subscript on $\alpha$ (global intercept). $\phi_i$ is your group fixed effects for groups $i$. Also, $\beta$ could be unsubscripted. Later, if you introduce any leads or lags you could use $\beta_t$. – Thomas Bilach Mar 16 '20 at 11:52
  • Thanks! Separate from the main regression model, I regress the covariate on $did_{it}$ (as well as the treatment group and time dummies), and find a significant effect. I further find that if I plot the covariate as an outcome variable, it fulfills the parallel trend assumption. – pythonuser Mar 17 '20 at 18:35

1 Answers1

1

Based upon your questions, it appears you are most concerned with covariate balance. Because of this, you ran the following model:

$$ \mathrm{Covariate}_{it} = \beta \mathrm{DiD}_{it} + \phi_{i} + \zeta_{t} + \varepsilon_{it}, $$

where you regressed the group-time period covariate(s) on treatment to test for any compositional changes. To be clear to anyone reading, this is simply moving a control(s) $X_{it}$ from the right-hand side to the left-hand side of the equation. In other words, you're replacing your outcome with the covariate. We would expect $\beta = 0$ under a null of no compositional changes. In practice, you should avoid controls that are themselves affected by treatment.

Question 1: There is a plausible story for why might have a significant effect on , but it is not part of my original hypothesis. Should I report this result?

It depends on your audience. You should articulate why you excluded specific covariates, though this should be obvious if you've described your study design well. I wouldn't concern yourself so much with plotting the evolution of your time-varying covariates. I would be more concerned about the magnitude of the compositional change than the result (significance) of this regression.

Though not directly related to your question, see this paper for a discussion of covariate balance regressions.

Question 2: Should I consider as a 'bad covariate' (something that is part of the effect of on rather than a genuine control) and eliminate it from the regression model?

If such control variables are, in fact, outcomes of treatment, then these should not be included in your model.

One of the reasons you want to include covariates, $X_{it}$, in a DD model is to improve precision. In other words, you want to isolate treatment effects by soaking up the residual variation. Often times, a reviewer might pressure you to incorporate specific controls. If the magnitude of compositional differences is not too drastic, I would run your model with, and without, $X_{it}$. If estimates of $\beta$ are not appreciably different, then I would recommend reporting both and moving on.

In general, if your treatment is truly random (wishful thinking, in practice), then your controls should not be affecting your treatment effect estimates. Outside of the caveat mentioned above, you should account for time-varying covariates, if any. These other variables might also explain variation in your outcome. This post also addresses the inclusion of covariates in DD applications.

Thomas Bilach
  • 4,732
  • 2
  • 6
  • 25
  • Thanks so much! – pythonuser Mar 18 '20 at 23:42
  • And not to belabor an earlier point but $\phi_{i}$ should represent a dummy for each *unit* in your sample. I know we often say "group" fixed effects but this isn't necessarily a dummy for one group, it is a dummy for *each* cross-sectional unit. This is a 'generalized' DD method and so I wanted to be clear that this represents dummies for all units (e.g., firms, counties, states, etc.). – Thomas Bilach Mar 19 '20 at 22:25