2

I am working with a data set of the mass of plastic found at various sites. At most sites, we found no plastic and so the data is zero-inflated (see histogram below). I want to model the data using variables such as human population, season (etc.) as explanatory variables, but am not sure which model to use. Does anyone have any suggestions?

I have been doing some reading and am not sure if a hurdle model or a zero-inflated model would be best. enter image description here

Ferdi
  • 4,882
  • 7
  • 42
  • 62
  • 4
    Possible duplicate of [What is the difference between zero-inflated and hurdle models?](https://stats.stackexchange.com/questions/81457/what-is-the-difference-between-zero-inflated-and-hurdle-models) In addition, it is hard to judge the exact sample size from your histogram, but it would seem as though you have far too few non-zero observations to attribute their size to a variety of explanatory variables. Perhaps restrict yourself to a singe explanatory variable, or better yet: Collect more data. (Your data are zero-inflated, a hurdle model is technically something else.) – Frans Rodenburg Jun 04 '19 at 06:31
  • Thanks for your reply. We sampled pretty extensively, but didn't find any plastic pollution at most sites (which I guess is a good thing!). I was also starting to think that there are just too many zeros to do any meaningful modeling. – Eleanor Weideman Jun 04 '19 at 08:28

0 Answers0