Questions tagged [poisson-regression]

Poisson regression is one of a number of regression models for dependent variables that are counts (non-negative integers). A more general model is negative binomial regression. Both have numerous variants.

Poisson regression is a regression in which the dependent variable is a count variable. The Poisson regression is a based on the Poisson distribution. In order to apply the Poisson Regression the Equidispersion Property has to be fulfilled: E[X] = Var[X]. If the Equidispersion Property is not fulfilled the Negative Binomial Regression might be a better approach.

Common variants of the Poisson regression are:

  • Zero-inflated Poisson Regression

The dependent variable is a count variable, but many values take on the variable 0.

  • Hurdle model with a Poisson hurdle

The dependent variable is a count variable, but many values take on the variable 0. In contrast to the Zero-Inflated Poisson Regression this is a two step procedure with a hurdle process (e.g. Probit hurdle) and a Poisson regression.

  • Zero-truncated Poisson

The dependent variable is a count variable which can never take on the variable 0. An example are the number of days a person stays in hospital.

Literature:

  • Greene, William H. (2003). Econometric Analysis (Fifth ed.). Prentice-Hall. pp. 740–752. ISBN 0130661899.

  • Paternoster R, Brame R (1997). "Multiple routes to delinquency? A test of developmental and general theories of crime". Criminology. 35: 45–84. doi:10.1111/j.1745-9125.1997.tb00870.x.

  • Berk R, MacDonald J (2008). "Overdispersion and Poisson regression" (PDF). Journal of Quantitative Criminology. 24 (3): 269–284. doi:10.1007/s10940-008-9048-4.

  • Ver Hoef, JAY M.; Boveng, Peter L. (2007-01-01). "Quasi-Poisson vs. Negative Binomial Regression: How should we model overdispersed count data?". Ecology. 88 (11): 2766–2772. Retrieved 2016-09-01. Further reading[edit]

  • Cameron, A. C.; Trivedi, P. K. (1998). Regression analysis of count data. Cambridge University Press. ISBN 0-521-63201-3.

  • Christensen, Ronald (1997). Log-linear models and logistic regression. Springer Texts in Statistics (Second ed.). New York: Springer-Verlag. ISBN 0-387-98247-7. MR 1633357.

  • Gouriéroux, Christian (2000). "The Econometrics of Discrete Positive Variables: the Poisson Model". Econometrics of Qualitative Dependent Variables. New York: Cambridge University Press. pp. 270–83. ISBN 0-521-58985-1.

  • Greene, William H. (2008). "Models for Event Counts and Duration". Econometric Analysis (8th ed.). Upper Saddle River: Prentice Hall. pp. 906–944. ISBN 978-0-13-600383-0.

  • Hilbe, J. M. (2007). Negative Binomial Regression. Cambridge University Press. ISBN 978-0-521-85772-7.

  • Jones, Andrew M.; et al. (2013). "Models for count data". Applied Health Economics. London: Routledge. pp. 295–341. ISBN 978-0-415-67682-3.

  • Wooldridge, J. M. (2002). Econometric analysis of cross section and panel data. Cambridge, Mass: MIT Press. p. 646-656.

  • Cameron, A. C. and Trivedi, P. K. 2009. Microeconometrics Using Stata. College Station, TX: Stata Press.

  • Cameron, A. C. and Trivedi, P. K. 1998. Regression Analysis of Count Data. New York: Cambridge Press.

  • Cameron, A. C. Advances in Count Data Regression Talk for the Applied Statistics Workshop, March 28, 2009. http://cameron.econ.ucdavis.edu/racd/count.html .

  • Dupont, W. D. 2002. Statistical Modeling for Biomedical Researchers: A Simple Introduction to the Analysis of Complex Data. New York: Cambridge Press.

  • Long, J. S. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications.

  • Long, J. S. and Freese, J. 2006. Regression Models for Categorical Dependent Variables Using Stata, Second Edition. College Station, TX: Stata Press.

Software implementations:

816 questions
103
votes
5 answers

Diagnostic plots for count regression

What diagnostic plots (and perhaps formal tests) do you find most informative for regressions where the outcome is a count variable? I'm especially interested in Poisson and negative binomial models, as well as zero-inflated and hurdle counterparts…
90
votes
1 answer

When to use an offset in a Poisson regression?

Does anybody know why offset in a Poisson regression is used? What do you achieve by this?
MarkDollar
  • 5,575
  • 14
  • 44
  • 60
40
votes
2 answers

When do Poisson and negative binomial regressions fit the same coefficients?

I’ve noticed that in R, Poisson and negative binomial (NB) regressions always seem to fit the same coefficients for categorical, but not continuous, predictors. For example, here's a regression with a categorical…
35
votes
5 answers

Why is Poisson regression used for count data?

I understand that for certain datasets such as voting it performs better. Why is Poisson regression used over ordinary linear regression or logistic regression? What is the mathematical motivation for it?
zaxtax
  • 523
  • 1
  • 5
  • 8
29
votes
1 answer

Nonlinear vs. generalized linear model: How do you refer to logistic, Poisson, etc. regression?

I have a question about semantics that I would like fellow statisticians' opinions on. We know models such as logistic, Poisson, etc. fall under the umbrella of generalized linear models. The model includes nonlinear functions of the parameters,…
28
votes
2 answers

Where does the offset go in Poisson/negative binomial regression?

(First of all, just to confirm, an offset variable functions basically the same way in Poisson and negative binomial regression, right?) Reading about the use of an offset variable, it seems to me that most sources recommend including that variable…
27
votes
3 answers

Interpreting plot of residuals vs. fitted values from Poisson regression

I am trying to fit data with a GLM (poisson regression) in R. When I plotted the residuals vs the fitted values, the plot created multiple (almost linear with a slight concave curve) "lines". What does this mean? library(faraway) modl <-…
jocelyn
  • 271
  • 1
  • 3
  • 3
24
votes
2 answers

In a Poisson model, what is the difference between using time as a covariate or an offset?

I recently discovered how to model exposures over time using the log of (e.g.) time as an offset in a Poisson regression. I understood that the offset corresponds to having time as covariate with coefficient 1. I'd like to better understand the…
Bakaburg
  • 2,293
  • 3
  • 21
  • 30
24
votes
1 answer

Why is the quasi-Poisson in GLM not treated as a special case of negative binomial?

I'm trying to fit generalized linear models to some sets of count data that might or might not be overdispersed. The two canonical distributions that apply here are the Poisson and Negative Binomial (Negbin), with E.V. $\mu$ and variance $Var_P =…
24
votes
1 answer

When to use Poisson vs. geometric vs. negative binomial GLMs for count data?

I'm trying to layout for myself when it's appropriate to use which regression type (geometric, Poisson, negative binomial) with count data, within the GLM framework (only 3 of the 8 GLM distributions are used for count data, although most of what…
24
votes
1 answer

Goodness of fit and which model to choose linear regression or Poisson

I need some advice regarding two main dilemmas in my research, which is a case study of 3 big pharmaceuticals and innovation. Number of patents per year is the dependent variable. My questions are What are the most important criteria for a good…
22
votes
1 answer

Latent variable interpretation of generalized linear models (GLMs)

Short version: We know that logistic regression and probit regression can be interpreted as involving a continuous latent variable that gets discretized according to some fixed threshold prior to observation. Is a similar latent variable…
19
votes
2 answers

When someone says residual deviance/df should ~ 1 for a Poisson model, how approximate is approximate?

I've often seen the advice for checking whether or not a Poisson model fit is over-dispersed involving dividing the residual deviance by the degrees of freedom. The resulting ratio should be "approximately 1". The question is what range are we…
19
votes
2 answers

Poisson or quasi poisson in a regression with count data and overdispersion?

I have count data (demand/offer analysis with counting number of customers, depending on - possibly - many factors). I tried a linear regression with normal errors, but my QQ-plot is not really good. I tried a log transformation of the answer: once…
18
votes
2 answers

How is it possible that Poisson GLM accepts non-integer numbers?

I am really stunned by the fact that the Poisson GLM accepts non-integer numbers! Look: Data (contents of data.txt): 1 2001 0.25 1 1 2002 0.5 1 1 2003 1 1 2 2001 0.25 1 2 2002 0.5 1 2 2003 1 1 R script: t…
Tomas
  • 5,735
  • 11
  • 52
  • 93
1
2 3
54 55