0

Assume I have 1000 responses from some count data, where each response follows the Poisson distribution with a mean (and variance) falling somewhere in the large range of 1 - 100. There is one explanatory variable that is roughly the mean of each response. Should this be the perfect case of using Poisson regression?

When I follow the usual diagnostic tests found here, they all point to the fact that the data is overdispersed. I assumed overdispersion only occurs when each response variable has a variance greater than its mean. Is this incorrect? As the response variables are taken from the Poisson distribution shouldn't this mean that the data is not overdispersed at all?

Appreciate I could be way off with my understanding here. Any help is greatly appreciated.

The R code I used to simulate the dataset is given below:

dist_length <- 1000
counts <- rep(0, dist_length)
lambdas <- rep(0, dist_length)
for (i in 1:dist_length)
{
    lambda <- runif(1, 1, 100)
    lambdas[i] <- lambda + runif(1, -0.1, 0.1)
    counts[i] <- rpois(1, lambda)
} 
poiss_dist <- data.frame('Count'=counts, 'Mean'=lambdas)
  • Each of your counts is coming from a different Poisson distribution, that is, each has a different mean. Hence the overdispersion. The following would produce a random sample of 1000 counts with a population mean of 2 and a population variance of 2: rpois(1000, 2). – dbwilson Aug 18 '20 at 16:52
  • @dbwilson Thanks for the reply. Doesn't Poisson regression assume that each response has a different mean that is calculated by the log of the linear predictor? – SportcastJon Aug 19 '20 at 17:13
  • Yes, but that is not the case in your data. Each count comes from a Poisson distribution with lambda as the mean but each or your lambs[i] is lambda plus some random noise. Thus, it does not fully account for overdispersion. – dbwilson Aug 20 '20 at 00:10
  • @dbwilson I introduced a very small random noise to better simulate an explanatory variable. The results are the same if the noise is removed. – SportcastJon Aug 20 '20 at 11:42
  • Hmm ... because you are entering lambda as a linear predictor, it is assuming a linear relationship between lambda and log count. Are you entering lambda or log(lambda)? I know that if you have a simple model with, for example, three groups, each from a different Poisson distribution, and your model includes an intercept and two dummy variables for two of the groups, that this works out. Any remaining overdispersion is the result of sampling error. – dbwilson Aug 20 '20 at 12:13
  • @dbwilson Ah think its clicked, thanks. If I take log of lambada as explanatory variable then my linear predictor will have correct relationship with response mean. This will cause my dispersion test to show dispersion of 1. I guess overdispersion in many cases can be attributed to poorly chosen or missing explanatory variables. – SportcastJon Aug 21 '20 at 10:00
  • Yes, but it might not be exactly 1, given sampling error. – dbwilson Aug 21 '20 at 13:23

0 Answers0