Why are my estimated coefficients different?
They should be different. You're no longer modeling count data. You're modeling rates.
The offset is just like any other predictor in a linear model, the coefficients of the other terms shouldn't change when it is uncorrelated.
No. The offset is not your typical covariate. The offset is a predictor whose coefficient is constrained to equal 1. If you moved the offset to the left-hand side and invoked the properties of logarithms you end up with your outcome divided by your offset. See this post for more information on the derivation. Note, once you weight by your offset/exposure (e.g., time, population size, geographic area, etc.), your coefficient on $X$ should change.
The R code below should help with the intuition. I modeled the outcome using two methods which produce similar results. The first method uses offset(.)
inside of the glm()
function. The second method models the rate explicitly. Note, once we divide the outcome $y$ by the exposure $e$, it alters the variance of the response. To correct for this, we weight by the offset (e.g., weight = e
) when fitting the model. Both approaches produce identical coefficents.
# R Example (Poisson Exposures)
set.seed(13)
x <- rnorm(100, sd = 0.1)
y <- rpois(100, exp(5 * x))
e <- rpois(100, 5) + 1 # this is your offset/exposure
y_weighted <- y / e # weighting by your offset/exposure
### --- Using offset(.)
mod_1 <- glm(y ~ x + offset(log(e)), family = 'poisson')
### --- Using the weighted outcome
mod_2 <- glm(y_weighted ~ x, family = 'poisson', weights = e)
round(mod_1$coefficients, 3)
(Intercept) x
-1.904 5.907
round(mod_2$coefficients, 3)
(Intercept) x
-1.904 5.907
Again, there is no coefficient estimated on your exposure variable. You are not holding $e$ fixed while assessing the impact of $x$ on $y$. You are, in fact, dividing the outcome by $e$ (i.e., the offset/exposure). Correlatedness between the exposure variable and other regressors shouldn't concern you in this setting. See this answer for more information.