I'm modeling how various landscape and ecological factors affect the I'd like to evaluate how well my negative binomial model performs over the null. I've specified an offset variable in my model to account for the fact that my areas sampled aren't all equal.
Here's the full model:
model.nb = glm.nb(tally ~ elev + slp + BA + offset(log(area)), data = data, maxit=1000)
If I want to see how much more deviance this model explains over a null model, should I include the offset variable in my null model? I.e., which of the two following nulls should I use?
# Null with offset
model.nb.null.off = glm.nb(tally ~ 1 + offset(log(area)), data = data, maxit=1000)
# Null without offset
model.nb.null = glm.nb(tally ~ 1, data = data, maxit=1000)
Because the offset is a predictor variable with the coefficient fixed at 1, I could rationalize dropping it in the null, in which I'm dropping all predictor variables except an intercept. However, that offset is fundamentally changing my tallies to rates -- and a rate is really what I should be modelling for each site, given differences in sampling area.
N.b., this is how I intend to compare the two models:
## Proportional increase in explained deviance (aka pseudo R^2)
# Option 1 (with offset in null)
(deviance(model.nb.null.off) - deviance(model.nb.step))/deviance(model.nb.null.off) * 100
# Option 2 (without offset in null)
(deviance(model.nb.null) - deviance(model.nb.step))/deviance(model.nb.null) * 100
Thanks for any guidance!