The code you posted for your model,
glmer.nb(Jobs ~ Month + Year + Region + (1+Month|Year) + (1+Year|Region), data = df)
seems a bit weird to me. You have Year as a fixed intercept, a random intercept, and a random slope. It can be two of those things, but making it all three is a real stretch. So the question is which two of the three make the most sense.
The EDA suggested the number of jobs vary over months for each year,
and the number of jobs a year varies between regions, which is my
justification for adding the random effects.
Without more information, it seems you have the number of jobs measured yearly within a region. This suggests a longitudinal model is appropriate, however figuring out how to incorporate time into your data is critical. Do you think years have unique effects on the number of jobs? Or is year better indexing a trend in job growth?
- Year as a unique effect.
If you believe each year has its own unique effect on jobs and you have 5 or more years in your data, irrespective of region, then you could model it thusly:
glmer.nb(Jobs ~ Month + (1+Month|Year) + (1|Region), data = df)
This is sometimes called a two-way error components model in econometrics. This model allows for the effect of Month on job creation to vary by year. You could also allow the effect of month to vary by region (1+Month|Region)
.
- Year indexing a trend
This is the classic growth model, in which year is used to capture the trend in job growth over the panel period, and could be modeled as such:
glmer.nb(Jobs ~ Month + Year + (1+Year|Region), data = df)
In this model, Year should be coded as a numeric, not a factor variable. This model allows for a linear effect of year on jobs, which varies from region to region. Put differently, you expect this time trend in jobs to be more correlated within regions than across regions. You could allow the trend to be non-linear, if such were appropriate. Month in this model should be treated as a factor variable, providing you with a set of coefficients that compare the average number of jobs created for a given month relative to whatever the reference month is.
There are other options, of course, but these are two of the more common ways people model longitudinal data.