How to model a bounded lognormal dependent variable with many zeros

Question

I try to analyze firms´ investment decisions. In my dataset 70% of the firms choose to invest 0 $ . 30% invest more than 0. (continuous variable). The data of the 30 % is log normally distributed.

Which model is recommended for such data structure? OLS, tobit or glm model? Or even something else?

First I thought a Tobit model would be appropriate but I then I gave up this idea due to the log 0 problem.

You might want to mention that you're working with panel data, if that is still the case. This both complicates the problem, but may allow you to estimate this. — dimitriy, Oct 24 '12 at 00:42

score 3 · Answer 1 · edited Apr 13 '17 at 12:44

You could model this as a zero inflated gamma (or zero inflated log normal), where you assume two process - one that generates zero with a certain probability (those that dont invest) and for those that do, a process that follows a gamma (or could use log normal).

I have used this model several times, writing contrast statements as Dale shows in the link below for E(Y). It is very useful. I wont go into more detail because the link sums up most of the salient points.

Here is a link to the code for fitting this model in SAS. The question was posed on this site about converting to R, without a definitive answer, but I am sure it is possible.

Another option is to use two separate models P(Y=0) and E(Y | y>0) and make inference about E(Y) with simulation (page of 150 of Gelman, Hill 2007).

How to model a bounded lognormal dependent variable with many zeros

1 Answers1