1

Let's say I'm trying to model the incidence of disease per 1000 people. I have data where people are id'd for example 1001,1002,1003,....

am I allowed to specify the offset as

geeglm(y~offset(log(1000))+x_1+x_2+...x_n,family=poisson,id=id,corstr=...)?

In other words, can the offset in poisson regression be a constant not in the dataframe?

  • 1
    What difference do you see between what you suggest and what the documentation says? https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/offset They look the same to me, and it will help people answer if you can clarify the difference. – Dave Feb 09 '20 at 14:08
  • Im suggesting a number while the documentation suggests an object. r treats everything as a vector and numbers are vectors therefore what I’m suggesting will work –  Feb 09 '20 at 14:09
  • 1
    Does your code not compile and give a model with log (1000) as the intercept? – Dave Feb 09 '20 at 14:13
  • No I was asking a hypothetical question –  Feb 09 '20 at 14:13
  • It seems like you are asking an R question, rather than a statistics one. Questions about programming are off-topic here. But if you can reframe your question to emphasize the statistics part (if that is what you are asking about) it will be on topic. – Peter Flom Feb 10 '20 at 11:33

1 Answers1

1

Assuming that geeglm() uses the standard R formula-processing machinery (model.matrix() and related), the offset doesn't need to be in the data frame, but it does need to be the same length as the other variables: R does not automatically recycle/replicate the offset as it does in most other contexts.

geeglm(y~offset(rep(log(1000),nrow(dd)))+x1+ ...,
       data=dd, ...)

For this reason it may be simpler (although a tiny bit wasteful) to add it to the data frame, where it will automatically be replicated appropriately:

dd$off <- log(1000)
geeglm(y ~ offset(off) + x_1 + ..., data=dd, ...)
Ben Bolker
  • 34,308
  • 2
  • 93
  • 126