I have a dataset collected over multiple years, with many samples collected per year. The samples are assumed be random within year (collected over a short period of time, over a large area, in a randomized spatial design). I want to assess the overall trend over time, but also take into account variation between years. My current approach is to have a smoother over years (as numeric) and use year (as factor) as the random intercept. The gam
indicates some serious concurvity issues, stemming from the collinearity between the continuous fixed variable and the categorical random variable. When I run the same model using glmmTMB
, it doesn't seem to have any problems. Below is a toy example, with data replicating the trends to my best ability.
Any advice would be appreciated - is my specification of gam
not the right way to specify this model? Should I be modeling this setup completely differently? Is there something wrong in how I'm assessing concurvity?
library(dplyr)
library(mgcv)
library(ggplot2)
library(glmmTMB)
library(performance)
T <- seq(0, 24, by = 1)
x <- rep(T, each = 5)
y <- 1 + 10*cos(2*(pi/3)*(x - 10)) + 2 * x
set.seed(0)
df <- data.frame(x = x, mu = y) %>%
group_by(x) %>%
mutate(xjitter = sample(-10:10, 1)) %>%
ungroup() %>%
mutate(mu1 = mu + xjitter,
y = mu1 + rnorm(n(), 0, 1),
xFac = factor(x))
# so overall linear trend, with some noise between years (=x values), and replication within year
ggplot(df) +
geom_point(aes(x = x, y = y))
m.gam <- gam(y ~ s(x, k = 10) + s(xFac, bs = "re"), data = df)
concurvity(m.gam) # full collinearity with x and xFac!
performance:::check_collinearity(m.gam) # this is extra impressive!
df$pred <- predict(m.gam, exclude = 's(xFac)')
df$pred.random <- predict(m.gam)
# the predictions seem ok
ggplot(df) +
geom_point(aes(x = x, y = y)) +
geom_point(aes(x = x, y = pred), colour = "red") +
geom_point(aes(x = x, y = pred.random), colour = "blue")
Now same, but with glmmTMB
. No collinearity issues, because the random factor doesn't seem to be included in the assessment of collinearity.
m.tmb <- glmmTMB(y ~ x + (1|xFac), data = df)
new <- df %>%
mutate(xFac = NA)
new$pred <- predict(m.tmb, newdata = new)
new$pred.random <- predict(m.tmb)
ggplot(df) +
geom_point(aes(x = x, y = y)) +
geom_point(data = new, aes(x = x, y = pred), colour = "red") +
geom_point(data = new, aes(x = x, y = pred.random), colour = "blue")
performance:::check_collinearity(m.tmb)