0

I would like to use a log-log model and one of my independent variables is log (investment in machinery/employee).

Some firms report 0 investment. I have to take logs and I would like to know which approach is most approriate:

  1. take log(investment in machinery/employee) and set log(0)=0 and add a dummy variable which is 1 when I set log(0) = 0 and 0 otherwise.

  2. add a arbitrarily small amount x when investment is 0 and take logs(0+x).

  3. Any other options?

  • What's the physical situation when firms report a zero? Is this truth or missing data? If missing data would imputation be the cleaner way out? – curious_cat Mar 05 '13 at 17:48
  • I've seen both approaches used "in the wild", but I just wanted to offer a friendly reminder that people often replace missing values with 0, so you might want to make sure that these are the "real" values instead of indicating "no data". – Matt Krause Mar 05 '13 at 17:50
  • Duplicates: http://stats.stackexchange.com/questions/6563/80-of-missing-data-in-a-single-variable and http://stats.stackexchange.com/questions/30728/how-small-a-quantity-should-be-added-to-x-to-avoid-taking-the-log-of-zero. Both treat different aspects of this question, but together they cover it fully. – whuber Mar 05 '13 at 18:03

0 Answers0