Questions tagged [count-data]

Count data are non-negative integers representing whole amounts.

When such data are the dependent variable in a regression, Poisson or negative binomial regression may be appropriate methods. One common problem is "zero-inflation" (where the proportion of zero values is greater than predicted by a distributional function); there are various models for dealing with this.

Wikipedia https://en.wikipedia.org/wiki/Count_data has an article with further references.

840 questions
66
votes
1 answer

Why is the square root transformation recommended for count data?

It is often recommended to take the square root when you have count data. (For some examples on CV, see @HarveyMotulsky's answer here, or @whuber's answer here.) On the other hand, when fitting a generalized linear model with a response variable…
36
votes
1 answer

Error metrics for cross-validating Poisson models

I'm cross validating a model that's trying to predict a count. If this was a binary classification problem, I'd calculate out-of-fold AUC, and if this was a regression problem I'd calculate out-of-fold RMSE or MAE. For a Poisson model, what error…
Zach
  • 22,308
  • 18
  • 114
  • 158
36
votes
3 answers

Is a "hurdle model" really one model? Or just two separate, sequential models?

Consider a hurdle model predicting count data y from a normal predictor x: set.seed(1839) # simulate poisson with many zeros x <- rnorm(100) e <- rnorm(100) y <- rpois(100, exp(-1.5 + x + e)) # how many zeroes? table(y == 0) FALSE TRUE 31 …
Mark White
  • 8,712
  • 4
  • 23
  • 61
35
votes
5 answers

Why is Poisson regression used for count data?

I understand that for certain datasets such as voting it performs better. Why is Poisson regression used over ordinary linear regression or logistic regression? What is the mathematical motivation for it?
zaxtax
  • 523
  • 1
  • 5
  • 8
32
votes
2 answers

Diagnostics for generalized linear (mixed) models (specifically residuals)

I am currently struggling with finding the right model for difficult count data (dependent variable). I have tried various different models (mixed effects models are necessary for my kind of data) such as lmer and lme4 (with a log transform) as well…
28
votes
2 answers

Continuous generalization of the negative binomial distribution

Negative binomial (NB) distribution is defined on non-negative integers and has probability mass function$$f(k;r,p)={\binom {k+r-1}{k}}p^{k}(1-p)^{r}.$$ Does it make sense to consider a continuous distribution on non-negative reals defined by the…
24
votes
1 answer

When to use Poisson vs. geometric vs. negative binomial GLMs for count data?

I'm trying to layout for myself when it's appropriate to use which regression type (geometric, Poisson, negative binomial) with count data, within the GLM framework (only 3 of the 8 GLM distributions are used for count data, although most of what…
24
votes
4 answers

Is this an appropriate method to test for seasonal effects in suicide count data?

I have 17 years (1995 to 2011) of death certificate data related to suicide deaths for a state in the U.S. There is a lot of mythology out there about suicides and the months/seasons, much of it contradictory, and of the literature I've reviewed, I…
svannoy
  • 343
  • 2
  • 7
23
votes
9 answers

Time series for count data, with counts < 20

I recently started working for a tuberculosis clinic. We meet periodically to discuss the number of TB cases we're currently treating, the number of tests administered, etc. I'd like to start modeling these counts so that we're not just guessing…
Matt Parker
  • 5,597
  • 5
  • 26
  • 37
22
votes
1 answer

Detecting outliers in count data

I have what I naively thought to be a fairly straight forward problem that involves outlier detection for many different sets of count data. Specifically, I want to determine if one or more values in a series of count data is higher or lower than…
Joe Gomphus
  • 221
  • 1
  • 2
  • 3
19
votes
2 answers

Poisson or quasi poisson in a regression with count data and overdispersion?

I have count data (demand/offer analysis with counting number of customers, depending on - possibly - many factors). I tried a linear regression with normal errors, but my QQ-plot is not really good. I tried a log transformation of the answer: once…
17
votes
4 answers

Strategy for deciding appropriate model for count data

What is the appropriate strategy for deciding which model to use with count data? I have count data that i need to model as a multilevel model and it was recommended to me (on this site) that the best way to do so this is through bugs or MCMCglmm.…
17
votes
3 answers

Predicting count data with random forest?

Can a Random Forest be trained to appropriately predict count data? How would this proceed? I have quite a extensive range of values so classification doesn't really make sense. If I would use regression would I simply truncate the results? I'm…
JEquihua
  • 3,525
  • 2
  • 24
  • 44
17
votes
4 answers

Zero-inflated negative binomial mixed-effects model in R

Is there such a package that provides for zero-inflated negative binomial mixed-effects model estimation in R? By that I mean: Zero-inflation where you can specify the binomial model for zero inflation, like in function zeroinfl in package pscl:…
16
votes
1 answer

significance of difference between two counts

Is there a way to determine whether a difference between a count of road accidents at time 1 is significantly different from a count at time 2? I have found different methods for determining the difference between groups of observations at…
jessop
  • 163
  • 1
  • 4
1
2 3
55 56