3

I have count data looking at the frequency of crime around Day of the week and Location. I used a glm with a Poisson distribution to model the data.

glm(freq~DayOfWeek+Location,family='poisson')

Would this be appropriate to do? If I wanted to look at if crime was more frequent during a specific time in a specific area, could I just use:

glm(freq~DayOfWeek*Location,family='poisson')

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Ted Mosby
  • 199
  • 1
  • 7

1 Answers1

1

It depends on what you want exactly and on how you entered your predictors ('DayOfWeek' and 'Location'). For this answer I assume both are a categorical (aka factor) variables. For this example I'll take Sunday as reference for 'DayOfWeek' and location 1 for 'Location' (note that you can change these).

If you use the glm as in your first line of code, you'd get the effect of 'DayOfWeek' conditional on 'Location' not changing (at least these are the semantics I'm taught). This may sound weird, but it just means that relative to a sunday, a monday increases the estimate for crime count with a factor of exp(coefficient for monday). The same goes for any 'Location' while keeping 'DayOfWeek' constant. In some studies this is called the 'independent' effect, or an effect corrected for confounding or overlapping information. I refrain from these terms because they are dependent on a lot of side information (such as the goal of the study) and assumptions (unknown confounders, other predictors, etc.).

However, if you suspect the effect of 'DayOfWeek' is dependent on 'Location' (e.g. wednesday is the location 3's drugdealer's discountday) you might want to check for different effects for different predictor value-pairs. This implies effect modification. This can be done by adding an interaction term (which is kind of what you did in your second glm code). Now you have a couple of options depending on what you want to see:

  1. Do you want the effect of both predictors seperately and then see how the interaction changes the effect for certain combinations? then use:

    glm(freq~DayOfWeek+Location+DayOfWeek:Location,family='poisson')

  2. Do you want the effect for the combinations of predictors only? then use:

    glm(freq~DayOfWeek:Location,family='poisson')

Do note that interaction terms sometimes cost a lot of degrees of freedom ($df$) (especially with categorical variables): $interaction_{df} = predictor1_{df}*predictor2_{df}$. Therefore, they might require substantial amounts of data.

Ps. Anyone with a better grasp of the terminology, feel free to edit.

IWS
  • 2,554
  • 13
  • 30