How to plot the graph or perform a formal test of parallel trends for generalized DiD?

Question

A. Background:

Dasgupta, 2019, p. 2596-2597 create a baseline as below to study the impact of staggered design difference-in-differences (DiD):

The baseline identification:

$Y_{it}$ = $\alpha$ + $\beta$ $(Leniency Law)_{kt}$ + $\delta$$X_{ikt}$ + $\theta$$_t$ + $\gamma$$_i$ +$\epsilon$$_{it}$ (1)

where $i$, $k$, and $t$ index firms, countries, and years respectively. $X_{ikt}$ is a vector of the different firm, country, and industry control, while $\gamma$ and $\theta$ are firm and year fixed effects.

The variable of interest here is $(Leniency Law)_{kt}$. Dasgupta, 2019, p. 2597 documented that this variable equals 0 before the passage of the leniency law in country $k$, and 1 afterward.

B. Author's description:

Dasgupta, 2019, p. 2597 said that

the treated group comprises all firms that are headquartered in countries that have passed a leniency law by year t. The control group comprises firms in countries that never adopted a leniency law in our sample period and firms headquartered in countries that adopted a leniency law at some later point of time.

I also had some discussion in Economics and Cross Validated about the sample selection.The best answer is @Thomas Bilach's idea when it comes to the treatment periods

The policy dummy 'turns on' (i.e., switches from 0 to 1) in their respective adoption periods. Any country not adopting a leniency law is coded 0 in all periods.

And it seems to align to the description of the "Multiple Groups and Time Periods" in this paper, page 456 that

Luckily, the main features of the DID design also apply in a broader set of conditions. When $G ≥ 2$ groups and $T ≥ 2$ periods, $D_{gt}$ = 1 if the treatment is active in group g and period t; otherwise, $D_{gt}$ = 0.

So put more simply, if Korea passed the law in 1997, $(Leniency Law)_{kt}$ for Korean firms will receive a value of 1 from 1997 until 2012 (the end of whole period). Or, in the same line of thought, if Australia implements the laws in 2003, then $(Leniency Law)_{kt}$ will receive a value of 0 from 1990 to 2002 and then a value of 1 from 2003 to 2012.

C. Discussion and some assumptions for Dasgupta, 2019 paper

Based on what they presented, I have some questions below:

What should be the end of the period?

Because from Dasgupta's description and Thomas Bilach's explanation, it seems that there is no window for me to stop the sample. Therefore, can I use 2017 as the end of my sample for two reasons: (1) The last country passed a leniency law in 2012, and (2) Dasgupta used the window [-2;+5] in his univariate results for plotting common trends?

How do I plot a figure like Dasgupta, 2019, Figure 1, section 3.1

In a post, Thomas Bilach also mentioned a way to do that

In my estimation, they used firms in the non-adopter countries but in the same industry to create the control group trend. In other words, they're assuming the untreated firms in the same SIC3 industry would have adopted the law around the same time as the treated firms in the same SIC3 industry.

Typically in event study frameworks we plot the coefficient values and not the raw trends across treatment/control groups. Thus, it doesn't appear to be a formal "pre-trends" test in my opinion. Their assessment of the parallel evolution of the trends seems rather ad hoc in my estimation, but warranted given the staggered nature of their treatment.

First of all, what does "coefficient value" mean? Because I saw in Figure 1 in Dasgupta's paper, he uses the mean values to plot the line. Is it relevant to plot the "coefficient values"?

I agree with @Thomas Bilach that there seems to be no formal test for such a DiD setting, I also read the "The Common Trends Assumption" from this paper but it seems there is no official test other than plotting the graph.

But as I asked Thomas Bilach, if I want to plot like his method, I am wondering how they choose the year from 1 to 5 then, and which is the benchmark year? And, it means that in 64 countries in this paper, when plotting this graph, only 9 non-adopter countries are control and 56 other adopter countries are in the treatment group when plotting then?

I saw in one of your given [link](https://stats.stackexchange.com/questions/526928/a-doubt-on-different-ways-of-performing-staggered-did), the answer also mentioned at the end that the plotting is quite suspicious. However, I also do not understand clearly how Dasgupta classify his sample. One vote for your efforts. — Louise, Jun 03 '21 at 09:03

Thomas Bilach · Accepted Answer · 2021-07-18T01:17:49.337

What should be the end of the period?

The span of the "effect window" is often arbitrarily chosen but it should be concentrated around the exposure period. We should expect relatively stable trends in the pre-period. If any volatility is present, we should expect it to be concentrated in the post-period. Note the schism after $t = 2$. It appears to last for a couple of periods before stabilizing. In my opinion, the purpose of limiting the observation window is to see whether the volatility comports with the timing of the law change.

Because from Dasgupta's description and Thomas Bilach's explanation, it seems that there is no window for me to stop the sample.

Technically, this statement is correct. The authors could have expanded their window to include additional periods.

Therefore, can I use 2017 as the end of my sample for two reasons: (1) The last country passed a leniency law in 2012, and (2) Dasgupta used the window [-2;+5] in his univariate results for plotting common trends?

Sure.

But do you even observe outcomes beyond 2012? Remember, the authors are plotting aggregate trends. Each entity is some amount of time units relative to the immediate adoption period.

How to plot a figure like Dasgupta, 2019, Figure 1, section 3.1?

I acknowledge that a full reproduction of their figure requires a more detailed understanding of how they defined the relative adoption periods for non-adopter countries. It appears the "pairing" of firms was achieved via the industry code. Based upon my reading of the paper, they're assuming the untreated firms in the same SIC3 industry would have adopted the law around the same time as the treated firms in the same SIC3 industry.

I acknowledge that their description of how they achieved this across all jurisdictions isn't very clear. Once you pair by industry you may find a subset of treated jurisdictions with similar adopter years. It seems plausible to assume the untreated firms would have adopted a leniency law around the same time as the treated firms within the same industry and with similar event dates. The authors don't explicitly indicate that they limited their sample so my explanation is somewhat conjectural. I should also acknowledge that your suspicions seem warranted. It is inappropriate to assign a relative exposure interval to untreated firms when the adoption years among treated firms vary so widely.

Let's switch gears and focus on plotting coefficient values. In settings with a staggered treatment, it's a plot of the relative period indicators (i.e., lead/lag dummies). I will sample some countries from Table 2 in their paper, ensuring at least one country is never treated. The treatment history for three randomly selected countries is as follows:

Jordan never adopts (i.e., it's always 0)
Belgium adopts early (i.e., 'turns on' in 2004)
Iceland adopts late (i.e., 'turns on' in 2005)

The data frame below is country-year panel for simplicity. The time dimension is truncated to include the years from 2000–2009. The variable $T_{kt}$ is the treatment dummy. It equals 1 if a country is treated and is in a post-treatment period, 0 otherwise. I also include indicators for years 1 and 2 before the law change, 0-2 after, and year 3 onward. The endpoint is "binned" to index all periods 3 years or more after the law change.

Note this is a finite window. It's perfectly permissible to include lead and lag indicators in more distant periods in either direction. It is also quite popular to trace out the full dynamic response to treatment, though estimates often get quite noisy as the window gets wider.

$$ \begin{array}{ccc} country & year & T_{kt} & d^{-2}_{kt} & d^{-1}_{kt} & d^{0}_{kt} & d^{+1}_{kt} & d^{+2}_{kt} & d^{+\bar{3}}_{kt} \\ \hline \text{Jordan} & 2000 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \text{Jordan} & 2001 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \text{Jordan} & 2002 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \text{Jordan} & 2003 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \text{Jordan} & 2004 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \text{Jordan} & 2005 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \text{Jordan} & 2006 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \text{Jordan} & 2007 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \text{Jordan} & 2008 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \text{Jordan} & 2009 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline \text{Belgium} & 2000 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \text{Belgium} & 2001 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \text{Belgium} & 2002 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ \text{Belgium} & 2003 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ \text{Belgium} & 2004 & 1 & 0 & 0 & 1 & 0 & 0 & 0 \\ \text{Belgium} & 2005 & 1 & 0 & 0 & 0 & 1 & 0 & 0 \\ \text{Belgium} & 2006 & 1 & 0 & 0 & 0 & 0 & 1 & 0 \\ \text{Belgium} & 2007 & 1 & 0 & 0 & 0 & 0 & 0 & 1 \\ \text{Belgium} & 2008 & 1 & 0 & 0 & 0 & 0 & 0 & 1 \\ \text{Belgium} & 2009 & 1 & 0 & 0 & 0 & 0 & 0 & 1 \\ \hline \text{Iceland} & 2000 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \text{Iceland} & 2001 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \text{Iceland} & 2002 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \text{Iceland} & 2003 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ \text{Iceland} & 2004 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ \text{Iceland} & 2005 & 1 & 0 & 0 & 1 & 0 & 0 & 0 \\ \text{Iceland} & 2006 & 1 & 0 & 0 & 0 & 1 & 0 & 0 \\ \text{Iceland} & 2007 & 1 & 0 & 0 & 0 & 0 & 1 & 0 \\ \text{Iceland} & 2008 & 1 & 0 & 0 & 0 & 0 & 0 & 1 \\ \text{Iceland} & 2009 & 1 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array} $$

Note $d^{-2}_{kt}$ is a lead indicator. It 'turns on' if a country is treated and is two periods before the immediate adoption year. Likewise, $d^{-1}_{kt}$ 'turns on' if a country is treated and is one period before the immediate adoption year. The full specification would look something like the following:

$$ y_{kt} = \alpha_k + \lambda_t + \delta_{-2} d_{k,t-2} + \delta_{-1} d_{k,t-1} + \delta d_{kt} + \delta_{+1} d_{k,t+1} + \delta_{+2} d_{k,t+2} + \delta_{+\bar{3}} d_{k,t+\bar{3}} + \epsilon_{kt}, $$

where I include some leads and lags of the policy dummy. The "coefficient values" refer to the estimates on each of the $\delta$'s. Note how $\delta$ is the immediate effect of a leniency law on your outcome(s) of interest; it's un-subscripted. I could've specified it as $\delta_0$ if I wanted to be explicit, but I think you get the basic idea. The major benefit of graphing these coefficients is that both the post-event effects and the identifying assumption of "no pre-event trends" become immediately apparent.

Let's review a few examples in applied work. Here is one by Autor 2003, which has become sort of the canonical example in economics. They estimated the impact of implied contract exceptions on log state temporary help supply.

Note the author includes indicators for 1 and 2 years before adoption, 0–3 after adoption, and year 4 forward. Some authors fail to mention how they handle the endpoints in situations where the window is finite, though here we're explicitly told the indicator 'turns on' at year 4 and stays on. The endpoint is "binned" and is consistent with the approach I used above (i.e., $d^{+\bar{3}}_{kt}$).

The points represent the coefficients on the leads and lags of adoption of the public policy and good faith exceptions. Note the coefficients on the adoption leads are close to zero, which suggests little evidence of an anticipatory response within states about to adopt an exception. Though the purpose of this exercise is to investigate a dynamic response to treatment, it's also attesting to the validity of the difference-in-differences design. We should expect the coefficients on the lead indicators to be bounded around zero. It's also worth highlighting that the authors acquired employment data from 1979–1995, so they didn't have to limit themselves to a finite number of adoption leads.

In another paper Venkataramani et al. 2019 investigate the association between college affirmative action bans and health risk behaviors among underrepresented minority adolescents. The authors use data from the 1991–2015 U.S. Youth Risk Behavior Survey, a nationally representative repeated cross-sectional survey of 9th–12th graders in public and private schools. I reproduced a figure on page 11 of their paper.

To achieve a similar plot, simply replace the policy dummy with indicators denoting the timing of interview relative to the policy change. Note how the model is saturated with a full series of lead and lag indicators. They account for all periods before and after the ban. The 2-year period before the ban (event time −2/−1) was denoted as the reference period. The blue diamonds denote the coefficients on the lead and lag indicators. In short, they compare the difference in the prevalence of the outcome, for each point in event time relative to the reference period, between individuals living in states where a ban was implemented versus individuals living in states where a ban was not implemented. The displays suggest little evidence of pre-existing trends in cigarette smoking and alcohol use. In other words, there's no differential pre-existing trends in the outcomes between exposed and unexposed states. In this setting, the authors were very explicit about their choice of reference and appear to have binned the endpoints.

In another paper Green et al. 2015 evaluate the effects of liberalizing bar hours on traffic fatalities. Their window only considered a finite number of leads and lags, though it's a bit more balanced. Here is a figure from page 197 which reports the coefficients on the leads and lags of treatment.

They authors restrict attention to the periods around the initial adoption year. They don't expressly indicate how they modeled the endpoints, though the lead and lags of more distant periods may have been considered. Note the coefficients on the lead dummies suggest no pre-policy effects.

Note how the "effect window" varies across studies. There is no optimal lead and/or lag structure. However, in many difference-in-differences applications the authors typically include lead indicators to test for evidence of pre-trends. In table 10 of Dasgupta's work they report estimates of indicator variables from 1 to 4 years before the law is adopted. Similar to the cited research, this is a form of pre-trends testing.

Thank you for your dedicated answer, Thomas Bilach, can I ask a couple of questions: (1) I do not fully understand the meaning of this sentence "Note the schism after t=2", or in another word, what does "schism" mean in this sentence? Sorry I am from a country that English is the second language so sometimes I cannot catch the idea reasonably — Louise, Jun 07 '21 at 08:04
(2) regarding this sentence "**But do you even observe outcomes beyond 2012? Remember, the authors are plotting aggregate trends. Each entity is some amount of time units relative to the immediate adoption period.**". So, do you mean "But do you even observe outcomes beyond 2012?" that whether I have the data of dependent variables from 2012 to 2017? And can you please help me to clarify this sentence more "**Each entity is some amount of time units relative to the immediate adoption period.**". I guess that "entity" here is firm but I cannot fully understand the sentence. Much appreciated. — Louise, Jun 07 '21 at 08:08
(3) "**it's a plot of the *relative* period indicators**", do you mean converting the year to relative year around the treatment year (-2;-1;0;1;2;3) ? — Louise, Jun 07 '21 at 09:02
(4) I do not fully understand the meaning of the word "binned" in this sentence "**The endpoint is "binned" to index all periods 3 years or more after the law change**". Could you please kindly help me to clarify it? — Louise, Jun 07 '21 at 09:09
(5) Regarding your **full specification**, it seems that we only can run with **Belgium** and** Iceland**, isn't it? Because we cannot define the relative indicator (lead/lag) of **Jordan**, I guess. And the full specification if it is explained like mine , **it seems that it is also an approach to test pre-trend without using the control group, I hope**? — Louise, Jun 07 '21 at 09:40
(6) Regarding the sentence "**It's also worth highlighting that the authors acquired employment data from 1979–1995, so they didn't have to limit themselves to a finite number of adoption leads.**" I understand what you mean, but I do not know why you put it here as a note "**so they didn't have to limit themselves to a finite number of adoption leads**", maybe I miss something important then ? — Louise, Jun 07 '21 at 09:44
(7) I am quite confused about the event time **(-2/-1)** in your amazing reproduction work from Venkataramani,2019 . I do not understand what does this number means and why we need to perform the number like that. — Louise, Jun 07 '21 at 09:55
Regarding the question **(5)** above, I am wondering if only it without trend plotting is okay enough for pre-trend testing then ? I sent an email to the author but there is no response so far so I am finding an alternative way to do so. — Louise, Jun 07 '21 at 10:02
(1) By "schism" I was drawing attention to how the lines seem to break apart around $t = 2$. (2) By "entities" I mean firms. And the time *relative to* treatment is when we center all treated units around $t = 0$. (3) Yes. You create a new variable to show the relative periods. In other words, the group of treatment/control firms are all some number of time units *approaching* the enactment year (i.e., $-3, -2, -1$) and some number of time units *away from* the enactment year (i.e., $0, +1, +2, +3$). (4) The variable $d^{+\bar{3}}_{kt}$ is a "binned" indicator. It 'turns on' and *stays on*. — Thomas Bilach, Jun 08 '21 at 17:51
(5) As indicated in my answer, it is difficult to define the relative periods when the enactment years vary so widely across countries. However, when it comes to plotting the *coefficients*, then you do not need to drop Jordan. Simply put the lead and lag indicators into your model. The values for Jordan will be consistently 0. (6) I simply meant that it's permissible to include more leads. The authors decided to plot leads for 1 and 2 years before adoption. In other studies, the authors may plot estimates 3 or even 4 years before adoption. It's up to you to decide how many to include. — Thomas Bilach, Jun 08 '21 at 18:04
(7) Surveys were conducted *biennially* since 1991, hence why the time intervals are expressed this way. They dropped the "2-year period" before the ban to act as a reference. — Thomas Bilach, Jun 08 '21 at 18:11
Thank you for you amazing explanation, Thomas, everything is clear now, just one last one curiosity in this topic, so, regarding the question (5) above, I remember I read one of your answer previously. And it seems that coefficients plotting is also a way to do pre-trend testing, is not it? and we can circumvent the hardness regarding plotting the control trend in Dagupsta Figure 1 case? — Louise, Jun 09 '21 at 21:01
Correct. It's one way to demonstrate the stability of the trends in the pre-period. It doesn't prove it, it's just evidence to support it. — Thomas Bilach, Jun 09 '21 at 23:13
Hi @Thomas Bilach, can I ask what does the decrease in coefficients over time in the last graph means (from Green, 2015)'s paper? — Louise, Jul 01 '21 at 01:39
The authors tested the effect of liberalizing bar hours on traffic accidents. Their results suggest they found a *decrease* in accidents once bar hours were extended. Notice how the effects grow larger in subsequent periods. It suggests a growing policy influence over time. — Thomas Bilach, Jul 07 '21 at 03:22

How to plot the graph or perform a formal test of parallel trends for generalized DiD?

1 Answers1

Linked