2

I was wondering from a technical perspective what approach I should follow in this modelling problem I have. I have a target variable Y which is a continuous random variable defined in the interval [0; infinity). For this reason (and this is also verified by the data itself) I decided to use a tweedie distribution. Moreover, I would like to have a multiplicative model, so I am using a log link-function. I also know that the variable Y is linearly dependent on the time variable. It is assumed that the more the time, the higher the Y value is. Given these conditions I followed two different approaches:

  1. Modeling the variable directly and using time as a log offset. Following R syntax the model would look like the following glm(Y ~ X1 + X2 + ... + offset(log(time)), family = tweedie(link = "log"))
  2. Modeling the ratio of Y and time and using time as training weights. Defining Y_time = Y / time we have glm(Y_time ~ X1 + X2 + ..., weights = time, family = tweedie(link = "log"))

Which approach is more theoretically sound?

  • 1
    Do some of this old posts help: https://stats.stackexchange.com/questions/246318/difference-between-offset-and-weights, https://stats.stackexchange.com/questions/326525/gamma-mixed-model-with-offset-and-or-weights, https://stats.stackexchange.com/questions/297859/can-weights-and-offset-lead-to-similar-results-in-poisson-regression, https://stats.stackexchange.com/questions/358980/can-i-model-incidence-per-1000-people-per-month-using-poisson-regression-without, – kjetil b halvorsen Sep 17 '20 at 16:43

0 Answers0