Exposure in Negative Binomial regression and other distributions

Question

For a Poisson regression we can model exposure $\epsilon_i$ in observation $Y_i$ as $Y_i \sim Poisson(\epsilon_i*\lambda)$.

For instance, in a Poisson regression, if we observe:

$y = \begin{bmatrix} 2 & 3 & 4 & 5 \end{bmatrix}$

With exposures $\epsilon = \begin{bmatrix} 2 & 1 & 1 & 1 \end{bmatrix}$

And covariate $x = \begin{bmatrix} 2.1 & 3.1 & 4.3 & 5.2 \end{bmatrix}$

We'll have a regression with the same coeficients as if we have observed:

$y = \begin{bmatrix} 1 & 1 & 3 & 4 & 5 \end{bmatrix}$

With exposures $\epsilon = \begin{bmatrix} 1 & 1 & 1 & 1 & 1 \end{bmatrix}$

And covariate $x = \begin{bmatrix} 2.1 & 2.1 & 3.1 & 4.3 & 5.2 \end{bmatrix}$

This because is Poisson can be written as (proportional to) $\lambda^{(\sum Y)} e^{(\sum \epsilon)}$, so it doesn't matter as long as we keep the total sum of y's and exposure constant and this, correct me if I'm wrong, is a tautology of the property of the Poisson mean and variance being equal.

This does not follow for a negative binomial because its variance is different from its mean.

Here's an example in R to illustrate that this follows for Poisson, but it doesn't for negative binomial:

require(MASS)
#These 3 have the same coefficients:
glm(c(Days[1]*2,Days[-1]) ~ Sex/(Age + Eth * Lrn) + offset(log(c(2,rep(1,nrow(quine)-1)))), data = quine, family=poisson())
glm(Days ~ Sex/(Age + Eth * Lrn), data = quine, weight=c(2,rep(1,nrow(quine)-1)), family=poisson())
glm(Days ~ Sex/(Age + Eth * Lrn), data = rbind(quine[1,],quine), family=poisson())


glm.nb(c(Days[1]*2,Days[-1]) ~ Sex/(Age + Eth * Lrn) + offset(log(c(2,rep(1,nrow(quine)-1)))), data = quine)
#the above produces different coefficients from the following two:
glm.nb(Days ~ Sex/(Age + Eth * Lrn), data = quine, weight=c(2,rep(1,nrow(quine)-1)))
glm.nb(Days ~ Sex/(Age + Eth * Lrn), data = rbind(quine[1,],quine))

So, can we still use exposure on it in the same fashion like: $Y_i \sim Negative\_binomial(\epsilon_i*\mu, \phi)$ (parametrized using mean and overdispersion)? What about other distributions, does this idea of exposure could be applied, say, for a binomial? How? We would be obligated to parametrize it using the mean?

Relevant question: http://stats.stackexchange.com/q/66792/7071 — dimitriy, Feb 21 '14 at 08:23

Exposure in Negative Binomial regression and other distributions

0 Answers0