3

I´vd tried to fit a zero-inflated negative binomial model with zeroinfl (package pscl):

model.zinb <- zeroinfl(formula = y ~ x1 + x2 + x3, data = data, dist = "negbin")          

a = data.frame(count = data$y)
b = data.frame(count = fitted(model.zinb)  
a$colour = "data"       
b$colour = "fitted"        
hist = rbind(a, b)

ggplot(hist, aes(count, fill = colour)) + 
      geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity', bins = 31)   

However, when plotting the count data y and the fitted data fitted(model.zinb), there are somehow almost no counts between 0 and 1 in the fitted model (see plot). It looks like the zeroinfl didn´t work. Since I'm a beginner in this field, I'm hoping to get some advice. Thanks!

enter image description here

user333591
  • 43
  • 3
  • What is your expected behavior? How does this deviate from the expected behavior? – Sycorax Aug 30 '21 at 18:43
  • I did expect the model to predict more values closer to zero, since my data is zero-inflated (and fitted(model.zinb) predicts almost no values close to zero as it can be seen in the plot). I'm confused. – user333591 Aug 30 '21 at 18:52
  • 1
    Or maybe I am misunderstanding something. I know that there are structural zeros (from the zero generating process) and sampling zeros (the count generating process). The fitted(model.zinb) command prints the mean response values of the fitted negative binomial distribution and therefore does not contain the structural zeros, right? So maybe this could be the reasons that no zeros are predicted? – user333591 Aug 30 '21 at 20:22
  • 2
    Yeah, this description seems correct. I think you were expecting something like posterior draws from the model given these predictors, instead of point estimates per observation. – Sycorax Aug 30 '21 at 21:02

1 Answers1

3

The fitted() method for zeroinfl objects returns the fitted mean $\hat \mu$ for each observation which can be pretty far from some of the counts $y$ with substantial probability $f_\mathrm{zeroinfl}(y, \hat \mu)$. This is explained and illustrated in the answers to:

Can a model for non-negative data with clumping at zeros (Tweedie GLM, zero-inflated GLM, etc.) predict exact zeros?

Moreover, instead of overlaying the histogram of the observed counts and expected probabilities (rather than means) it's easier to judge deviations in a so-called hanging rootogram. See: Kleiber & Zeileis (2016). The American Statistician, 70(3), 296–303. doi:10.1080/00031305.2016.1173590. An R implementation is available in the countreg package on R-Forge (successor to the pscl implementation) and illustrated in:

Confused on how to interpret ZINB and Hurdle models

https://stackoverflow.com/questions/43075911/examining-residuals-and-visualizing-zero-inflated-poission-r/43584320

Achim Zeileis
  • 13,510
  • 1
  • 29
  • 53