11

For analysing zero-inflated bird counts I'd like to apply zero-inflated count models using the R package pscl. However, having a look at the example provided in the documentation for one of the main functions (?zeroinfl), I begin doubting what's the real advantage of these models. According to the sample code given there, I calculated standard poisson, quasi-poisson and negative bionomial models, simple zero-inflated poisson and negative binomial models and zero-inflated poisson and negative-binomial models with regressors for the zero component. Then I inspected the histograms of the observed and the fitted data. (Here's the code for replicating that.)

library(pscl)
data("bioChemists", package = "pscl")

## standard count data models
fm_pois  <- glm(art ~ .,    data = bioChemists, family = poisson)
fm_qpois <- glm(art ~ .,    data = bioChemists, family = quasipoisson)
fm_nb    <- glm.nb(art ~ ., data = bioChemists)

## with simple inflation (no regressors for zero component)
fm_zip  <- zeroinfl(art ~ . | 1, data = bioChemists)
fm_zinb <- zeroinfl(art ~ . | 1, data = bioChemists, dist = "negbin")

## inflation with regressors
fm_zip2  <- zeroinfl(art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + 
                     ment, data = bioChemists)
fm_zinb2 <- zeroinfl(art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + 
                     ment, data = bioChemists, dist = "negbin")

## histograms
breaks <- seq(-0.5,20.5,1)
par(mfrow=c(4,2))
hist(bioChemists$art,  breaks=breaks)
hist(fitted(fm_pois),  breaks=breaks)
hist(fitted(fm_qpois), breaks=breaks)
hist(fitted(fm_nb),    breaks=breaks)
hist(fitted(fm_zip),   breaks=breaks)
hist(fitted(fm_zinb),  breaks=breaks)
hist(fitted(fm_zip2),  breaks=breaks)
hist(fitted(fm_zinb2), breaks=breaks)!

Histogram of observed and fitted data

I can't see any fundamental difference between the different models (apart from that the example data don't appear very "zero-inflated" to me...); actually none of the models yields a halfway reasonable estimation of the number of zeros. Can anyone explain what's the advantage of the zero-inflated models? I suppose there must have been a reason to choose this as the example for the function.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
user7417
  • 893
  • 2
  • 9
  • 18

2 Answers2

15

I think this is a poorly chosen data set for exploring the advantages of zero inflated models, because, as you note, there isn't that much zero inflation.

plot(fitted(fm_pois), fitted(fm_zinb))

shows that the predicted values are almost identical.

In data sets with more zero-inflation, the ZI models give different (and usually better fitting) results than Poisson.

Another way to compare the fit of the models is to compare the size of residuals:

boxplot(abs(resid(fm_pois) - resid(fm_zinb)))

shows that, even here, the residuals from the Poisson are smaller than those from the ZINB. If you have some idea of a magnitude of the residual that is really problematic, you can see what proportion of the residuals in each model are above that. E.g. if being off by more than 1 was unacceptable

sum(abs(resid(fm_pois) > 1))
sum(abs(resid(fm_zinb) > 1))

shows the latter is a bit better - 20 fewer large residuals.

Then the question is whether the added complexity of the models is worth it to you.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
6

The fitted values will show less dispersion than the observed values because of random variation. You're not making a meaningful comparison. To take a simple case, if your data were just $X_i\sim\mathrm{Pois(\mu)}$ you wouldn't compare a histogram of $x_i$ against a histogram of the fitted value $\hat{\mu}$ - the same for all $i$ ! Though it would be reasonable to simulate values of $x^*_i$ from $X^*_i\sim\mathrm{Pois(\hat{\mu})}$ & compare histograms of $x_i^*$ & $x_i$.

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248