0

I'm pretty new to statistics and need advice on how to analyse zero-inflated, thick-tailed, panel distributions. My sample is a count of enterprise births per city and per year across U.S. cities and 14 years. Many cities have 0 counts. A few cities have very high counts, i.e. the distributions in any given year are zipf-like. I wish to test the relationship of my sample with a time-lagged measurement of industrial diversity. This predictor variable is non-zero in every city and year.

Here's a plot of my 2 variables in 2014/2013. The y-axis is logarithmic, so 376 zero-frequencies are omitted (out of 917 points). enter image description here A similar plot repeats over 14 years, but there are gaps in the data (some cities appear in certain years and not others), so that the panel is unbalanced (1000 cities over 3 to 14 years).

I'm considering testing a power law, thinking that zero- and one-inflated discrete values are (conceptually, not mathematically) consistent with a function converging asymptotically to zero, as well as with the distribution of the non-zero-values of my sample. I'm thinking that panel data should strengthen the cross-sectional relationship (I'm not interested in the longitudinal trend).

Is it a good strategy? Is there a more appropriate one? What techniques would you use? Thank you in advance for any help.

syre
  • 235
  • 2
  • 11
  • Maybe a zero-inflated negative binomial would be the best first try. Then you can correct for your pannel-data with Fixed Effects or Random Effects. – Ferdi Mar 28 '17 at 12:03
  • @Ferdi Thank you! My hypothesis is that the intensity of diversity increases the likelihood of firm birth locally. How would I interpret a ZINB in that context? I read [elsewhere](http://stats.stackexchange.com/questions/81457/what-is-the-difference-between-zero-inflated-and-hurdle-distributions-models) about hurdle models. Would a hurdle model be more appropriate for my hypothesis? Power laws are common with urban data. Is there such a thing as a zero-inflated power law model? Finally, would random effects be more appropriate if the focus is the cross-sectional relationship? – syre Mar 29 '17 at 02:48
  • Read here about the difference between zero-inflated and hurdle models. In any case I would use a Negative Binomial distribution and not a poisson distribution. http://stats.stackexchange.com/questions/81457/what-is-the-difference-between-zero-inflated-and-hurdle-distributions-models – Ferdi Mar 29 '17 at 08:43

0 Answers0