4

My GLM is as follows:

logit.final <- glm(Claim_Occurrence ~ Sum.Insured100kto200k + Sum.Insured200kto300k + 
                                      Sum.Insured30kto50k   + Sum.Insured50kto100k + 
                                      Sum.Insured300Kplus, 
                   family = binomial(link = "logit"), offset = Exposure.Years.Earned)

I am trying to predict whether a claim will be reported in a vehicle or not, based on sum insured. The base level of the Sum.Insured categorical variable is Sum.Insured0to30K. Exposure years is the offset term, which is between 0 and 1. For example, a 0.5 would mean 6 months and 1 would mean a year.

If the fitted intercept is -2.64997, does this mean the odds of a claim occurring in a vehicle with sum insured 0 to 30K is 7.07% (i.e., $\exp -2.64997)$)? Would the offset term have any influence on this odds / interpretation?

EDIT:

I read somewhere that the coefficient of an offset is 1. So to incorporate the offset in my interpretation, would the odds be $\exp(-2.64997 + 1) = 19\%$?

EDIT 2:

Okay, as per advise in the answer, I have removed Exposure Years Earned from offset term, and included it as a predictor.

My revised glm model is now as follows:

logit.final <- glm(Claim_Occurrence ~ Sum.Insured100kto200k + Sum.Insured200kto300k + Sum.Insured30kto50k + Sum.Insured50kto100k + Sum.Insured300Kplus + Exposure.Years.Earned, family = binomial(link = "logit"))

My intercept is now -3.6464, and coeff estimate of Exposure years earned is 2.0046.

So if I want to find probability of claim occurrence of a vehicle with sum insured 0 to 30K, and exposure years earned worth of 1.083, would it be Exp(-3.6464) x Exp(2.0046) x 1.083 = 20.98% ?

user295559
  • 61
  • 2
  • 1
    The answer from @Gung notes that an offset probably isn't helpful here. In your formula at the end of the question, you would have to multiply the fixed coefficient of 1 by the number of years involved for a case. An offset for exposure-years would thus mean you are assuming that the _log-odds_ of a claim increases by a value of _exactly_ 1 _per year_. That doesn't seem to make sense. Also, you might want to consider a Poisson model (with an offset) instead, as your binomial model doesn't distinguish between having 1 claim and having 10 claims. The number of claims seems to be worth modeling. – EdM Sep 16 '20 at 20:56
  • First, it's always safest to calculate the linear predictor (`Intercept + coefficient * predictorValue`) before you do any exponentials. The value I get for the linear predictor is then -1.476, which exponentiated is 0.228. Your value of 0.2098 came from exponentiating the coefficient for exposure-years before multiplying by the number of exposure-years. Second, the exponentiated result is the estimated _odds_ of the outcome, not the probability. – EdM Sep 20 '20 at 16:15

1 Answers1

2

An offset is just a variable whose coefficient in the fitted model is forced to be exactly $1$. You can use an offset any time there is sufficient justification for that. You can also use it to fix a coefficient at a different level by multiplying the variable by the desired value and then forcing it to have a subsequent coefficient of $1$. In general, this should only be done when there is a really strong justification, though.

Offsets have a special role to play in models for count data (e.g., Poisson regression or negative binomial regression). In that case, because the logarithm is the link function, and due to the nature of counts, using the offset allows you to model rates (for more information, see: When to use an offset in a Poisson regression?). As a result of these facts, offsets are most common in count models.

In your case, I suspect using an offset is not ideal, and you would be better off just using exposure years as a regular variable. To be explicit, you are not modeling rates in your context and the interpretation does not change.

To answer your stated question, the intercept still means the same thing. It is the log odds of a claim when all other variables are exactly equal to 0. Thus, when the sum insured is 0 to 30K and exposure years is exactly 0, the odds of a claim is $\exp(−2.64997) = 7.07\%$.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650