I am estimating a difference-in-difference based on propensity score matching. The "treatment"-variable defines whether a household registered for a public insurance which was only active for two years. One of the variables I would like to explain is the difference in household income before and after the program. The variable is not skewed but has very long tails on both sides, the median is about zero. I was wondering whether a neg-log transformation is appropriate in this setting. Since my estimate will just be a scalar and not a distribution I am also puzzled how to interpret it or transform it back.
Asked
Active
Viewed 415 times
1

gung - Reinstate Monica
- 132,789
- 81
- 357
- 650

user59309
- 11
- 1
-
I don't quite understand your description of the income variable because if the median is zero then about half of the sample has negative income. I find this hard to imagine. Usually income is normally distributed after a log transformation for strictly positive incomes. – Andy Oct 26 '14 at 10:52
-
Hi Andy, thanks for your question. The outcome variable is the difference of income pre- and after treatment. Income in both periods has negative values because it includes return to investments etc. which in some cases is negative. – user59309 Oct 26 '14 at 19:12
-
Hm I'm still puzzled. I thought you were running a difference in differences regression. The outcome should therefore be income which is affected for the treatment group by signing up for public insurance. I can't quite follow the idea of taking the difference of pre- and post-treatment income. – Andy Oct 26 '14 at 19:28
-
Difference in differences compares the average change over time in the outcome variable for the treatment group to the average change over time for the control group. For PSM in STATA that means, that you have to create the difference in the outcome var over time yourself and the matching commands will then estimate the difference across groups. – user59309 Oct 27 '14 at 18:09
-
Oh okay, now I see what you are trying to achieve. This isn't the correct approach though. You should first run the matching (I guess you are doing 1:1 matching), then use the sample of the matched individuals in a diff in diff regression where you have dummies for the treatment group and the treatment period, and their interactions. The answer to this [question](http://stats.stackexchange.com/questions/61218/propensity-score-matching-with-panel-data) shows you how to do propensity score matching with panel data. Back then I had the same application of PSM and DiD in mind as you :) – Andy Oct 27 '14 at 18:15
-
Andy, thank you for the suggestion. I feed psmatch2 and teffects with the pre-treatment variables for the propensity score specification. As the outcome I use the difference over time in income. I think that both methods (this one and the one you suggested) should lead to the same estimate, if I understand it correctly. – user59309 Oct 27 '14 at 20:47
-
They shouldn't because constructing the difference in pre- and post-treatment outcomes yourself does not control for any covariates. This is taken care of in the DiD regression on the matched sample. Typically studies in economics also do it this way for exactly the same reason (in the question I had linked there are two links to papers that apply this technique). If I had to guess on how to do it I'd stick with what has already been done in the literature. – Andy Oct 27 '14 at 20:53