What model for continuous data with excess zeros?

Question

I am relatively new to statistics and have an issue with choosing an appropriate model to describe my data.

I am looking at geographical distribution of point occurrences of a species. As a measure of how densely observations are distributed I have gathered data on the mean distance from an occurrence point to its 15 nearest neighbours. The crux is that these species occurrences are often reported to general "observation sites" marked at a single location. Thus I have a lot of occurrences, all located at the exact same few geographical locations. This causes many of my data points to have a "mean distance to neighbours" of 0, which strongly right-skews my dataset.

SO MY QUESTION IS!: What distribution model would be the best option to apply to this kind of data?

I briefly tried looking at zero inflated and hurdle models. But these seem inappropriate to me as my data is not counts, but continuous distances. Also since I don't have a situation where processes first determine "presence/absence" of an event and then the nature of that event when present. Rather I just have an excess of zeroes that are generated by the SAME process as the rest of my data values. But i am really not sure!

I have several continuous independent variables.

I am working in R. So if anyone has tips on how i should solve this in there it would be greatly appreciated!

Related (unanswered) question: https://stats.stackexchange.com/questions/282990/does-there-exist-zero-inflated-linear-regression — Tim, Jun 03 '20 at 10:35

score 0 · Answer 1 · answered Jun 04 '20 at 15:58

0

Based on your description, should you not be scaling up your site occurrences to location occurrences?

Otherwise, a zero inflated gamma distribution comes to mind. Using glmmTMB you could specify an intercept only ziformula, so your conditional model would tell you the influence of your predictors to mean distance for those cases where that distance is >0

answered Jun 04 '20 at 15:58

Angelos Amyntas

95
9

There is a more fundamental issue here: your observations are interdependent (mean distance of A to B,C,D is not independent from mean distance of B to A,C,D and so on) – Angelos Amyntas Jun 04 '20 at 16:32

What model for continuous data with excess zeros?

1 Answers1