I have time series of several variables of 60 or so rows of count data. I want to do a regression model y ~ x
. I've chosen to use a Quasipoisson & Negative Binomial GLMs as there's overdispersion etc.
x
Min. : 24000
1st Qu.: 72000
Median :117095
Mean :197607
3rd Qu.:291388
Max. :607492
y
Min. : 136345
1st Qu.: 405239
Median : 468296
Mean : 515937
3rd Qu.: 633089
Max. :1218937
The data itself are very high and so it may be best to model these as count data (this is what I'm trying to investigate - at which point I can model count data as continuous). It seems to be very common practice, what I want to know is the motivation for this?
Are there any texts that actually show the problem of modelling high count data with Poisson distribution? Perhaps something that shows the factorial in the distribution makes things difficult.