Difference-in-difference regression using R

Question

Could you suggest R-code for the implementation of difference-in-difference regression? I don't understand how many coefficients I need. In my analysis I compare the effect of a new law on the stock exchange volume, I have 2 periods and 2 samples. Thanks a lot!

Thanks Jeremy, but in the Imbens's model the regression contains a series of multiplications between variables and dummies. I used:

lr1 <- lm(VOLUME ~ DUMMYCAP + DUMMYTIME + DUMMYCAP*DUMMYTIME )

but the result of regression is not significant.

This question is about code and its economic significance. I'm sorry if this post is not appropriate for the site. But if someone has made a test of this type and knows how to write the code I would be grateful

I had mistakenly flagged this question as too broad when it should have been off topic. Given that the question is only about code and not conceptually about DiD it would be more suitable for SO — Andy, Mar 12 '14 at 19:58
This question appears to be off-topic because it is asks for code, which is not what this site is about. — whuber, Mar 12 '14 at 20:28

score 3 · Answer 1 · edited Apr 13 '17 at 12:44

I covered this recently in my answer to "Which test should be used to compare two mean differences?" I included code for Levene's test of unequal residuals as well, which you may want to test in your case. Because residuals were unequal in my example, I also demonstrated how a rank transformation of the outcome can improve this somewhat (though rank transformation has its drawbacks, and isn't appropriate for all circumstances). Last, I noted some debate on whether independent-samples $t$-tests on pre-post differences would be appropriate as an alternative – it seems to depend somewhat on whether your samples were really sampled randomly.

In your case, I imagine the code for a DID general linear model would look something like this:

summary(lm(VOLUME~scale(PRE.VOLUME,scale=F)*DUMMYTIME,yourdata))

In the above, yourdata is a data.frame with PRE.VOLUME and VOLUME as separate columns for stock exchange volume before and after the law, and DUMMYTIME as a dummy-coded binary variable for differentiating your two samples. If your interaction term (scale(PRE.VOLUME,scale=F):DUMMYTIME) is fairly large and reliable ("significant", basically), interpret sample differences with caution, because that means another violated assumption in DID / ANCOVA. If it doesn't, you can take the interaction term out of the model by replacing the * with a + as in @JeremyMiles' example.

You can use the ggplot2 package to visualize the regression lines for each sample on a scatterplot:

ggplot(yourdata,aes(x=PRE.VOLUME,y=VOLUME,colour=factor(DUMMYTIME)))+geom_point()+
stat_smooth(method='lm',formula=y~scale(x,scale=F))

Difference-in-difference regression using R

1 Answers1