Questions tagged [count-data]

Count data are non-negative integers representing whole amounts.

When such data are the dependent variable in a regression, Poisson or negative binomial regression may be appropriate methods. One common problem is "zero-inflation" (where the proportion of zero values is greater than predicted by a distributional function); there are various models for dealing with this.

Wikipedia https://en.wikipedia.org/wiki/Count_data has an article with further references.

840 questions

votes

1 answer

Why is the square root transformation recommended for count data?

It is often recommended to take the square root when you have count data. (For some examples on CV, see @HarveyMotulsky's answer here, or @whuber's answer here.) On the other hand, when fitting a generalized linear model with a response variable…

asked Dec 22 '12 at 03:11

gung - Reinstate Monica

132,789
81
357
650

votes

1 answer

Error metrics for cross-validating Poisson models

I'm cross validating a model that's trying to predict a count. If this was a binary classification problem, I'd calculate out-of-fold AUC, and if this was a regression problem I'd calculate out-of-fold RMSE or MAE. For a Poisson model, what error…

cross-validation poisson-distribution count-data deviance scoring-rules

asked Oct 02 '13 at 18:56

Zach

22,308
18
114
158

votes

3 answers

Is a "hurdle model" really one model? Or just two separate, sequential models?

Consider a hurdle model predicting count data y from a normal predictor x: set.seed(1839) # simulate poisson with many zeros x <- rnorm(100) e <- rnorm(100) y <- rpois(100, exp(-1.5 + x + e)) # how many zeroes? table(y == 0) FALSE TRUE 31 …

r count-data zero-inflation

asked Dec 30 '17 at 17:35

Mark White

8,712
4
23
61

votes

5 answers

Why is Poisson regression used for count data?

I understand that for certain datasets such as voting it performs better. Why is Poisson regression used over ordinary linear regression or logistic regression? What is the mathematical motivation for it?

count-data poisson-regression

asked Sep 23 '10 at 19:38

zaxtax

votes

2 answers

Diagnostics for generalized linear (mixed) models (specifically residuals)

I am currently struggling with finding the right model for difficult count data (dependent variable). I have tried various different models (mixed effects models are necessary for my kind of data) such as lmer and lme4 (with a log transform) as well…

generalized-linear-model residuals negative-binomial-distribution count-data glmm

asked Dec 07 '15 at 15:53

fsociety

1,084
1
12
25

votes

2 answers

Continuous generalization of the negative binomial distribution

Negative binomial (NB) distribution is defined on non-negative integers and has probability mass function$$f(k;r,p)={\binom {k+r-1}{k}}p^{k}(1-p)^{r}.$$ Does it make sense to consider a continuous distribution on non-negative reals defined by the…

distributions negative-binomial-distribution count-data continuous-data bioinformatics

asked Oct 29 '17 at 22:47

amoeba

93,463
28
275
317

votes

1 answer

When to use Poisson vs. geometric vs. negative binomial GLMs for count data?

I'm trying to layout for myself when it's appropriate to use which regression type (geometric, Poisson, negative binomial) with count data, within the GLM framework (only 3 of the 8 GLM distributions are used for count data, although most of what…

generalized-linear-model negative-binomial-distribution count-data poisson-regression zero-inflation

asked Jun 09 '15 at 16:27

timothy.s.lau

1,043
2
11
26

votes

4 answers

Is this an appropriate method to test for seasonal effects in suicide count data?

I have 17 years (1995 to 2011) of death certificate data related to suicide deaths for a state in the U.S. There is a lot of mythology out there about suicides and the months/seasons, much of it contradictory, and of the literature I've reviewed, I…

r chi-squared-test arima count-data seasonality

asked Apr 03 '15 at 22:47

svannoy

votes

9 answers

Time series for count data, with counts < 20

I recently started working for a tuberculosis clinic. We meet periodically to discuss the number of TB cases we're currently treating, the number of tests administered, etc. I'd like to start modeling these counts so that we're not just guessing…

r time-series poisson-distribution count-data epidemiology

asked Jul 19 '10 at 23:37

Matt Parker

5,597
5
26
37

votes

1 answer

Detecting outliers in count data

I have what I naively thought to be a fairly straight forward problem that involves outlier detection for many different sets of count data. Specifically, I want to determine if one or more values in a series of count data is higher or lower than…

outliers count-data fitting

asked Apr 17 '13 at 20:11

Joe Gomphus

votes

2 answers

Poisson or quasi poisson in a regression with count data and overdispersion?

I have count data (demand/offer analysis with counting number of customers, depending on - possibly - many factors). I tried a linear regression with normal errors, but my QQ-plot is not really good. I tried a log transformation of the answer: once…

count-data poisson-regression overdispersion quasi-likelihood

asked Jan 09 '12 at 16:50

Antonin

votes

4 answers

Strategy for deciding appropriate model for count data

What is the appropriate strategy for deciding which model to use with count data? I have count data that i need to model as a multilevel model and it was recommended to me (on this site) that the best way to do so this is through bugs or MCMCglmm.…

generalized-linear-model poisson-distribution count-data negative-binomial-distribution overdispersion

asked Feb 23 '11 at 11:05

George Michaelides

1,039
1
9
19

votes

3 answers

Predicting count data with random forest?

Can a Random Forest be trained to appropriately predict count data? How would this proceed? I have quite a extensive range of values so classification doesn't really make sense. If I would use regression would I simply truncate the results? I'm…

r regression random-forest prediction count-data

asked Feb 25 '13 at 05:37

JEquihua

3,525
2
24
44

votes

4 answers

Zero-inflated negative binomial mixed-effects model in R

Is there such a package that provides for zero-inflated negative binomial mixed-effects model estimation in R? By that I mean: Zero-inflation where you can specify the binomial model for zero inflation, like in function zeroinfl in package pscl:…

r mixed-model count-data negative-binomial-distribution zero-inflation

asked Sep 28 '12 at 15:02

Nikita Samoylov

votes

1 answer

significance of difference between two counts

Is there a way to determine whether a difference between a count of road accidents at time 1 is significantly different from a count at time 2? I have found different methods for determining the difference between groups of observations at…

statistical-significance count-data

asked Jun 03 '15 at 11:45

jessop

2 3

…

55 56 Next