Questions tagged [zero-inflation]

Excessive 0's in a variable compared to a specified reference distribution. Regression approaches include zero-inflated models and hurdle (2-part) models. For count data, zero-inflated and hurdle models based on Poisson or negative binomial distributions are common (ZIP/ZINB and HP/HNB).

509 questions
104
votes
4 answers

What is the difference between zero-inflated and hurdle models?

I wonder if there is a clear-cut difference between the so-called zero-inflated distributions (models) and so-called hurdle-at-zero distributions (models)? The terms occur quite often in the literature and I suspect they are not the same, but would…
skulker
  • 1,268
  • 2
  • 9
  • 6
103
votes
5 answers

Diagnostic plots for count regression

What diagnostic plots (and perhaps formal tests) do you find most informative for regressions where the outcome is a count variable? I'm especially interested in Poisson and negative binomial models, as well as zero-inflated and hurdle counterparts…
36
votes
3 answers

Is a "hurdle model" really one model? Or just two separate, sequential models?

Consider a hurdle model predicting count data y from a normal predictor x: set.seed(1839) # simulate poisson with many zeros x <- rnorm(100) e <- rnorm(100) y <- rpois(100, exp(-1.5 + x + e)) # how many zeroes? table(y == 0) FALSE TRUE 31 …
Mark White
  • 8,712
  • 4
  • 23
  • 61
33
votes
2 answers

How to model non-negative zero-inflated continuous data?

I'm currently trying to apply a linear model (family = gaussian) to an indicator of biodiversity that cannot take values lower than zero, is zero-inflated and is continuous. Values range from 0 to a little over 0.25. As a consequence, there is quite…
26
votes
6 answers

Beta regression of proportion data including 1 and 0

I am trying to produce a model for which I have a response variable which is a proportion between 0 and 1, this includes quite a few 0s and 1s but also many values in between. I am thinking about attempting a beta regression. The package I have…
25
votes
2 answers

Fitting custom distributions by MLE

My question relates to fitting custom distributions in R but I feel it has enough of a probability element to remain on CV. I have an interesting set of data which has the following characteristics: Large mass at zero Sizeable mass below a…
epp
  • 2,372
  • 2
  • 12
  • 31
24
votes
1 answer

When to use Poisson vs. geometric vs. negative binomial GLMs for count data?

I'm trying to layout for myself when it's appropriate to use which regression type (geometric, Poisson, negative binomial) with count data, within the GLM framework (only 3 of the 8 GLM distributions are used for count data, although most of what…
23
votes
5 answers

Dealing with 0,1 values in a beta regression

I have some data in [0,1] which I would like to analyze with a beta regression. Of course something needs to be done to accommodate the 0,1 values. I dislike modifying data to fit a model. also I don't believe that zero and 1 inflation is a good…
20
votes
2 answers

Why exactly can't beta regression deal with 0s and 1s in the response variable?

Beta regression (i.e. GLM with beta distribution and usually the logit link function) is often recommended to deal with response aka dependent variable taking values between 0 and 1, such as fractions, ratios, or probabilities: Regression for an…
19
votes
3 answers

Zero inflated distributions, what are they really?

I am struggling to understand zero inflated distributions. What are they? What's the point? If I have data with many zeroes, then I could fit a logistic regression first calculate the probability of zeroes, and then I could remove all the zeroes,…
Calro
  • 191
  • 1
  • 3
19
votes
3 answers

Can a model for non-negative data with clumping at zeros (Tweedie GLM, zero-inflated GLM, etc.) predict exact zeros?

A Tweedie distribution can model skewed data with a point mass at zero when the parameter $p$ (exponent in the mean-variance relationship) is between 1 and 2. Similarly a zero-inflated (whether otherwise continuous or discrete) model may have a…
17
votes
4 answers

Zero-inflated negative binomial mixed-effects model in R

Is there such a package that provides for zero-inflated negative binomial mixed-effects model estimation in R? By that I mean: Zero-inflation where you can specify the binomial model for zero inflation, like in function zeroinfl in package pscl:…
16
votes
0 answers

Gamma hurdle model for continuous response?

I am modelling invertebrate.biomass ~ habitat.type * calendar.day + habitat.type * calendar.day ^ 2, with a random intercept of transect.id (50 transects were repeated 5 times) My response is zero-heavy - about 25% are 0s - and the non-zeroes are…
Tom Finch
  • 261
  • 2
  • 4
15
votes
2 answers

"Zero-inflated" predictors in regression?

I know that zero-inflated models (e.g. zero-inflated Poisson or negative binomial models) can be used for dependent variables. I also know that in general there are no assumptions for the independent variables (i.e. predictors) in regression…
KuJ
  • 1,356
  • 3
  • 15
  • 25
14
votes
1 answer

Zero-inflated Poisson regression

Suppose $ \textbf{Y} = (Y_1, \dots, Y_n)'$ are independent and $$\eqalign{ Y_i = 0 & \text{with probability} \ p_i+(1-p_i)e^{-\lambda_i}\\ Y_i = k & \text{with probability} \ (1-p_i)e^{-\lambda_i} \lambda_{i}^{k}/k! }$$ Also suppose the parameters…
Damien
  • 743
  • 7
  • 17
1
2 3
33 34