1

I have a dataset of counts (responses to a marketing campaign), collected at the zipcode level. I am trying to use a poisson regression to determine underlying response rate of each zipcode.

How do I account for the fact that some zipcodes have very large populations, while other have very small populations, and my estimate of the response rate in these zip codes is much more uncertain? Furthermore, a count of 1 in a small zipcode is a lot more meaningful than a count of 1 in a large zipcode.

Is Poisson even the right approach to take here? Some example code in R would be appreciated.

Zach
  • 22,308
  • 18
  • 114
  • 158
  • I'm not sure I understand the question. Are you positing a separate response rate for each zip? If so, you kind of already have it, unless you want to set up some hierarchical poisson model (which will shrink all the estimates towards the global mean - is that what you're looking for?). Why does it bother you that the zipcode-level standard errors are different? – alex Mar 15 '13 at 18:02
  • I'm try to estimate how various factors affect the response rate. I am positing that, absent these factors the response rates will be equal in all zip codes. I'm having trouble figuring out how to fit this model. – Zach Mar 15 '13 at 19:10
  • Ahh, I thought you were just trying to estimate the "underlying response rate of each zipcode" – alex Mar 15 '13 at 19:53

1 Answers1

8

You can assume that the counts $y$ are proportional to population $P$. That would mean that \begin{equation}\frac{E[y \vert x]}{P}=\exp\{x'\beta\}.\end{equation} This is algebraically equivalent$^*$ to a model where \begin{equation}E[y \vert x]=\exp\{x'\beta+\log{P}\},\end{equation} which is just the Poisson model with the coefficient on $\log P$ constrained to $1$. This is called a logarithmic offset. So if income goes up, the marginal effect will be bigger in a more populated zip code.

I think R code would look something like this:

glm.fit <- glm(y ~ offset(log(P)) + x, family=poisson(link=log))

You can also test the proportionality assumption by relaxing the constraint and testing the hypothesis that $\beta_{log(P)}=1$.

And just so there's no confusion:

From John Cook's Endeavour blog

Image Source: John Cook's Endeavour blog


$^*$ To get identical estimates, you would also need to use $P$ as weight when using the ratio as the outcome.

dimitriy
  • 31,081
  • 5
  • 63
  • 138