2

I have created a mixed model to model the number of jobs in different regions with the following random effects. The EDA suggested the number of jobs vary over months for each year, and the number of jobs a year varies between regions, which is my justification for adding the random effects.

glmer.nb(Jobs ~ Month + Year + Region  + (1+Month|Year) + (1+Year|Region), data = df)

Looking at the dotplots, do they suggest that the random slopes are not necessary as there is not much variation? or do they support adding in random slopes due to the small confidence regions?

enter image description here

enter image description here

user553480
  • 459
  • 2
  • 7

1 Answers1

1

The code you posted for your model,

glmer.nb(Jobs ~ Month + Year + Region + (1+Month|Year) + (1+Year|Region), data = df)

seems a bit weird to me. You have Year as a fixed intercept, a random intercept, and a random slope. It can be two of those things, but making it all three is a real stretch. So the question is which two of the three make the most sense.

The EDA suggested the number of jobs vary over months for each year, and the number of jobs a year varies between regions, which is my justification for adding the random effects.

Without more information, it seems you have the number of jobs measured yearly within a region. This suggests a longitudinal model is appropriate, however figuring out how to incorporate time into your data is critical. Do you think years have unique effects on the number of jobs? Or is year better indexing a trend in job growth?

  1. Year as a unique effect.

If you believe each year has its own unique effect on jobs and you have 5 or more years in your data, irrespective of region, then you could model it thusly:

glmer.nb(Jobs ~ Month + (1+Month|Year) + (1|Region), data = df)

This is sometimes called a two-way error components model in econometrics. This model allows for the effect of Month on job creation to vary by year. You could also allow the effect of month to vary by region (1+Month|Region).

  1. Year indexing a trend

This is the classic growth model, in which year is used to capture the trend in job growth over the panel period, and could be modeled as such:

glmer.nb(Jobs ~ Month + Year + (1+Year|Region), data = df)

In this model, Year should be coded as a numeric, not a factor variable. This model allows for a linear effect of year on jobs, which varies from region to region. Put differently, you expect this time trend in jobs to be more correlated within regions than across regions. You could allow the trend to be non-linear, if such were appropriate. Month in this model should be treated as a factor variable, providing you with a set of coefficients that compare the average number of jobs created for a given month relative to whatever the reference month is.

There are other options, of course, but these are two of the more common ways people model longitudinal data.

Erik Ruzek
  • 3,297
  • 10
  • 18
  • Thank you for your reply, what I am specifically interested in is the variation of jobs over time between 14 regions, and the variation between 10 years (with values for each month in the year). With that in mind, would you recommend model 1 (year as unique effect) be more appropriate for me? – user553480 Jun 17 '20 at 19:11
  • Yes, model 1 seems like the right option for you. You may get pushback on it, depending on your field of study. Check out two-way error components models, as suggested in my response for more information on modeling longitudinal data this way. – Erik Ruzek Jun 18 '20 at 18:25