Should I include offset in null negative binomial model for comparing to full model?

Question

I'm modeling how various landscape and ecological factors affect the I'd like to evaluate how well my negative binomial model performs over the null. I've specified an offset variable in my model to account for the fact that my areas sampled aren't all equal.

Here's the full model:

model.nb = glm.nb(tally ~ elev + slp + BA + offset(log(area)), data = data, maxit=1000)

If I want to see how much more deviance this model explains over a null model, should I include the offset variable in my null model? I.e., which of the two following nulls should I use?

# Null with offset
model.nb.null.off = glm.nb(tally ~ 1 + offset(log(area)), data = data, maxit=1000)

# Null without offset
model.nb.null = glm.nb(tally ~ 1, data = data, maxit=1000)

Because the offset is a predictor variable with the coefficient fixed at 1, I could rationalize dropping it in the null, in which I'm dropping all predictor variables except an intercept. However, that offset is fundamentally changing my tallies to rates -- and a rate is really what I should be modelling for each site, given differences in sampling area.

N.b., this is how I intend to compare the two models:

## Proportional increase in explained deviance (aka pseudo R^2)
# Option 1 (with offset in null)
(deviance(model.nb.null.off) - deviance(model.nb.step))/deviance(model.nb.null.off) * 100

# Option 2 (without offset in null)
(deviance(model.nb.null) - deviance(model.nb.step))/deviance(model.nb.null) * 100

Thanks for any guidance!

Isabella Ghement · Accepted Answer · 2019-03-10T17:39:57.327

For the purpose of this answer, let's say your areas are measured in squared kilometres.

Your model.nb.null is postulating that the expected tally across all areas represented by the ones in your study is constant constant (i.e., it does not depend on any of your predictor variables). It doesn't appear you are interested in modelling the expected tally across all these areas given that tallies are compiled over areas of different magnitudes, so this model should not be a consideration for your model building procedure.

In contrast, the model.nb.null.off is postulating that the expected tally per squared kilometre of area is constant (i.e., it does not depend on any of your predictor variables).

Since you want to investigate whether the expected tally per squared kilometre of area depends on any of your predictor variables, it makes sense to use model.nb.null.off as your null model and track what happens with the null model deviance as you add predictors to this null model.

The choice of which null model to use depends on what you want to model - modeling the expected tally across all areas represented by the ones in your study requires the use of model.nb.null, whereas modeling the expected tally per squared kilometre of area requires the use model.nb.null.off.

You're welcome, @ltlf653! I'm glad you found it helpful. Presuming you are using a log link, your null model with offset is actually modeling the log expected tally per squared kilometer, but I didn't explicitly mention that in my response to avoid confusion. — Isabella Ghement, Mar 10 '19 at 20:04

Should I include offset in null negative binomial model for comparing to full model?

1 Answers1