5

I'm using a negative binomial GLMM with R package lme4 to detect differences in time mothers spend feeding before and after birth (inf_cat).

    inf30.feed <- glmer.nb(feeding ~ (inf_cat) + 
                    offset(total_inf_cat) + (1|female), 
                    data=mother_ownno_inf30)

My model has an offset of the total amount of time spent observing the individual. I'm still relatively new to GLMMs in R and I've been looking at a lot of examples online, many of which have the offset in a log scale.

Does the offset always have to be on a log scale? Why? And when is it appropriate to do?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
sam2210
  • 27
  • 5

2 Answers2

7

Normally, an offset is used when we are modelling some sort of rate data (e.g. deaths per 100,000, crashes per 100,000 etc).

This is naturally modelled as some sort of ratio so have data in the form of $E(y_i)/n_i$

In GLM, we model the expectation through some sort of link function, so

$$ g^{-1}(E(y_i)/n_i) = \mathbf{x}^T\beta$$

With the logarithmic link function, we have

$$ \log(E(y_i)) = \mathbf{x}^T\beta + \log(n_i) $$

from application of log rules. So to answer your question, the offset is not always a log. It depends on the link function you use.

user257566
  • 724
  • 4
  • 14
Demetri Pananos
  • 24,380
  • 1
  • 36
  • 94
5

This question is related to the choice of link function for your generalized linear model. McCullagh and Nelder say (page 31):

The link function relates the linear predictor $\eta$ to the expected value $\mu$ of a datum [outcome value] $y$.

The link function is what makes this a generalized linear model. Hidden in your call to glmer.nb() is a default choice of a log link function. That is, you are (perhaps without knowing it) modeling the log of the expected value of feeding with the linear predictor. Equivalently, the expected value of feeding is found by exponentiating the linear predictor.

In the way you've written your model, the fixed-effect part* of the linear predictor would be: $\beta_0$ + $\beta_1$ inf_cat + total_inf_cat. Here, $\beta_0$ is the intercept, $\beta_1$ is the regression coefficient for inf_cat, and the offset restricts the coefficient of total_inf_cat to be exactly 1. So the way you have written the model, each 1 unit increase of total_inf_cat would give you an $e$-fold increase of feeding.

Does that make sense in terms of your understanding of the subject matter? Probably not, if you think that total_inf_cat is the total available duration and that the amount of feeding should be directly proportional to total_inf_cat, other things being equal. Then the log link should be accompanied by an offset of log(total_inf_cat), to maintain that direct proportionality.

There are other link-function choices for negative binomial models, with a square-root and an identity link also available for glmer.nb(). As Demetri Pananos says in another answer, if you do choose a different link function you would have to choose a different offset to keep proportionality between feeding and total_inf_cat. For example, your model with the offset of total_inf_cat would make sense if you specified the identity link in your call to glmer.nb(). This page and its links discuss the choices. With count data, the log link typically makes the most sense.

Finally, negative binomial models are most useful with count data that have more variance than would be expected from a Poisson model, where the variance necessarily equals the mean. If feeding is a continuous variable (amount of time spent feeding) instead, you might be better off with a different type of model. But with a generalized linear model of any type, the same principle of choosing an offset to give the desired behavior combined with the link function holds.


*I assume that female represents a set of IDs of the mothers. Then the (1|female) random-effect part of the model allows for different individuals to have different intercept values.

EdM
  • 57,766
  • 7
  • 66
  • 187