When is it appropriate to model count data as continuous?

Question

I have time series of several variables of 60 or so rows of count data. I want to do a regression model y ~ x. I've chosen to use a Quasipoisson & Negative Binomial GLMs as there's overdispersion etc.

x
Min.   : 24000  
1st Qu.: 72000  
Median :117095  
Mean   :197607  
3rd Qu.:291388  
Max.   :607492  

y
Min.   : 136345
1st Qu.: 405239
Median : 468296
Mean   : 515937
3rd Qu.: 633089
Max.   :1218937

The data itself are very high and so it may be best to model these as count data (this is what I'm trying to investigate - at which point I can model count data as continuous). It seems to be very common practice, what I want to know is the motivation for this?

Are there any texts that actually show the problem of modelling high count data with Poisson distribution? Perhaps something that shows the factorial in the distribution makes things difficult.

It's small numbers (small expected counts) where it's critical to model as count data. Whether there's a calculation problem at some size of count will depend on the software, but I don't see that a carefully implemented calculation should have a problem with those counts. Either way, they're certainly large enough to approximate by normal distributions, via nonlinear least squares or Iterative Reweighted Least Squares, say, but that, too, would need to be carefully implemented. — Glen_b, Oct 21 '13 at 20:37
Are there any references / journals / books that say this? I need to be able to justify using the linear model. — phg, Nov 20 '13 at 16:32
See https://stats.stackexchange.com/questions/342635/closest-approximation-of-a-poisson-glm-using-weighted-least-squares-analysis-to — kjetil b halvorsen, Oct 03 '20 at 18:06

When is it appropriate to model count data as continuous?

0 Answers0