5

I am trying to model incidence rates (number of malaria cases per 1000 people per month) over time. I only have the data in the form of rates per 1000 people per month, i.e. I do not know the total population size each month. i.e. in January, there were 3.3 cases per 1000 individuals, in February, there were 5.6 cases per 1000 individuals etc. etc.

What I am unsure about is whether I can use a Poisson regression model to model the rates i.e 5.6 and 3.3, without using an offset, or weighted Poisson regression? I haven't done either of these things as the population would always be 1000 so I don't think using an offset or a weighted Poisson regression approach would achieve anything.

I am just not sure whether my approach is valid when the "count data" aren't the actual number of events, and indeed aren't count data as they are not integers.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
marty237
  • 61
  • 3

1 Answers1

3

Short: You can use a quasi-Poisson regression.

You have data for $n$ regions/populations/months, each of size $N_i$ (which could vary from month to month.) The number of malaria cases is $Y_i$, but that is not given to you, neither is the $N_i$'s known to you. Apart from trying to obtain that missing information, you have to work with what is given, the rates $$\DeclareMathOperator{\E}{\mathbb{E}} R_i =1000\frac{Y_i}{N_i}.$$ We start with a Poisson regression model in terms of $Y_i$ $$ \E [Y_i \mid x_i] = \lambda_i, \quad \log\lambda_i= \log N_i + x_i^T\beta $$ which we cannot use directly, since $Y_i$ is not known.

Rewriting in terms of $R_i$ we get $$ \E [R_i \mid x_i] = 1000\frac{\lambda_i}{N_i} $$ with logarithm $x_i^T\beta +\log 1000$. This is a quasi-poisson model for $R_i$ with offset $\log 1000$, but since the offset is the same for all the observations it can just be absorbed in the intercept. Note that even with $N_i$ unknown we have no problem with expectation structure, since the expectation do not depend on $N_i$. But there will be one problem: The quasi-poisson model do use the assumption that the variance is proportional to the expectation. But that will no longer be true, because if the unknown $N_i$ are nonconstant, that will not hold here. See: Assume that $Y_i \mid X_i=x_i$ is Poisson, then we find that $ \DeclareMathOperator{\V}{\mathbb{V}} \V [R_i \mid x_i]=\frac{1000^2 \lambda_i}{N_i^2}$. So while the expectation structure is right, the variance structure is not. So maybe use robust standard errors.

This is a version of Poisson rate regression, see for instance How is a Poisson rate regression equal to a Poisson regression with corresponding offset term?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467