1

I am following work by Borusyak, 2021 which uses a triple-difference design:

In triple-difference designs, the data have three dimensions, such as regions $i$, demographic groups $g$, and periods $t$. Conventional static OLS estimation is based on the regression

$Yigt = αig + αit + αgt + τDigt + εigt$

Therefore, what he did is: when observations are defined by $i,g,t$ when, say, $i$ are counties and $g$ are age groups, specify a variable $ig$ identifying the $(i,g)$ pairs as the unit identifier, add appropriate FEs.

did_imputation Y ig t Eig, fe(ig i#t g#t) cluster(i)

the event time Eig term should be specific to the $i,g$ pairs, not to the $i$. For instance, Eig is missing for a never-treated age group in a county where other groups are treated at some point.

I am wondering why this setting is quite different from what Kota Mori explained to me here:

$$ Y_{i,t} = \alpha_i + \beta_t + \gamma D_{i,t} + \delta g_i D_{i,t} $$

where person and time dummies for DiD purposes are $\alpha_i$ and $\beta_t$.

I am wondering why there are differences between these two models and why Borusyak controlled for up to three types of fixed effects.

Thomas Bilach
  • 4,732
  • 2
  • 6
  • 25
Louise
  • 97
  • 1
  • 16

1 Answers1

2

The difference-in-difference-in-differences (DDD) estimator you're referencing is typically used in settings where a law (i.e., treatment) is enacted in some non-uniform (i.e., staggered) manner across jurisdictions (e.g., states). In practice, you may have a mixture of early- and late-adopter states, and another subset of states that never adopt the new law in all time periods under observation. States rarely impose new legislation at the same time, and evaluators typically wish to exploit this variation in treatment timing. We may also suspect this new legislation affects sub-groups differently within those treated states. This is yet another layer of variation we can exploit, assuming we actually observe individuals over time within each state.

Suppose during your evaluation of the effect of anti-corruption laws on wages in the United States you acquire detailed employment records for individuals nested within states. Now suppose you suspect the law has a differential impact by age group or gender. Say, for example, you age stratify workers within each state. For simplicity, a theory suggests the law affects the earnings of younger employees differently than older employees. You dichotomize workers accordingly, with all employees under 35 years of age falling into the younger age category. Well, the new law now varies over three dimensions: age group $a$, state $s$, and year $t$. It's important to be aware that because the laws are introduced at different times, a standard three-way interaction term isn't going to work. We must define the law dummy to account for the staggered adoption periods.

The more general representation of the DDD equation is as follows:

$$ Y_{iast} = \gamma_{st} + \lambda_{at} + \eta_{as} + \delta L_{ast} + u_{iast}, $$

where $Y_{iast}$, which denotes the earnings of individuals in age group $a$ in state $s$ and year $t$, is regressed on a full set of state $\times$ year effects (i.e., $\gamma_{st}$), age $\times$ year effects (i.e., $\lambda_{at}$), age $\times$ state effects (i.e., $\eta_{as}$), and a law dummy (i.e., $L_{ast}$). It's also permissible to include a concatenated version of, say, state-year and then letting software 'dummy out' all the relevant state-by-year effects for you. Software will invariably drop some of the second level terms to break the collinearity, but it shouldn't affect your estimate of $\delta$. I should stress that you must include all of the second level interaction terms or your model may be misspecified.

Note, the principal variable of interest, $L_{ast}$, 'turns on' (i.e., switches from 0 to 1) in those $a$-$s$-$t$ combinations where the law is in place. It is your triple interaction term just defined in a different way. According to your other post, it appears there is no well-defined period delineating pre-/post-treatment, so we must define the law indicator in a way that it captures the staggered onset of treatment across states, which invariably includes the subset of individuals more sensitive to the anti-collusion legislation (i.e., younger employees). Put differently, imagine the law dummy is a column of $0$'s. As you work your way down the rows, assign the observation a value of $1$ if the individual in your sample is under 35 years of age and nested within a treated state and is in a year after the anti-corruption law went into effect.

In the paper you referenced, it appears the author is clustering by region and estimating all second-order interaction terms. You could call these terms "fixed effects" but it's a bit misleading, because it assumes you only need a fixed effect at the individual, unit, and/or time level. But this is not the case. You should also attempt estimation of all state-year, age-year, and age-state effects to preserve the hierarchy.

Thomas Bilach
  • 4,732
  • 2
  • 6
  • 25
  • A very great answer, @Thomas, can I ask what does "It's also permissible to include a **concatenated version of, say, state-year** and then **letting software 'dummy out' all the relevant state-by-year effects** for you." mean, I did not get it fully – Louise Sep 05 '21 at 21:30
  • 1
    You may have two separate columns, say state *and* year, in which case it's permissible to simply multiply the two variables together (e.g., `state#year`). In other circumstances, you may have one variable denoting say, state-year (e.g., "California-2020"), in which case you could let software 'dummy out' the respecting state-year pairs for you (e.g., `i.state_year`). Does that make sense? – Thomas Bilach Sep 06 '21 at 17:41
  • it makes sense, thanks :D – Louise Sep 06 '21 at 21:00