I've been working through a reproducible example to better understand AR1 covariance matrix using the glmmTMB
package. I have a couple of questions, even if only one can be answered I would greatly appreciate it.
Here is some synthetic data, many thanks to Ben Bolker for this code:
library(glmmTMB)
library(MASS)
simCor1 <- function(phi=0.8,sdgrp=2,sdres=1,
npergrp=20,ngrp=20,
seed=NULL,
## set linkinv/simfun for GLMM sims
linkinv=identity,
simfun=identity) {
if (!is.null(seed)) set.seed(seed)
cmat <- sdres*phi^abs(outer(0:(npergrp-1),0:(npergrp-1),"-"))
errs <- MASS::mvrnorm(ngrp,mu=rep(0,npergrp),Sigma=cmat)
ranef <- rnorm(ngrp,mean=0,sd=sdgrp)
d <- data.frame(f=rep(1:ngrp,each=npergrp))
eta <- ranef[as.numeric(d$f)] + c(t(errs)) ## unpack errors by row
mu <- linkinv(eta)
d$y <- simfun(mu)
d$tt <- factor(rep(1:npergrp,ngrp))
return(d)
}
d <- simCor1(phi = 0.8, seed = 100)
Here is a basic AR1 model in glmmTMB
:
mod <- glmmTMB(y~1 + (1|f) + ar1(tt-1|f), data=d,family=gaussian)
On analysing the model fit, it does a good job of identifying the phi parameter with an estimate of 0.85:
summary(mod)
...
Conditional model:
Groups Name Variance Std.Dev. Corr
f (Intercept) 4.15645 2.0387
f.1 tt1 1.06157 1.0303 0.85 (ar1)
Residual 0.04183 0.2045
Number of obs: 400, groups: f, 20
...
Here are my questions:
1) Why does the package notation require time tt
as a random slope for ar1()
to work? One consequence of this seems to be when I use the predict()
function it seems to have the retrospective benefit of knowing how each group performed at each time interval, leading to predictions very close to the actual value y
. Using a reproducible example where phi=0
:
d <- simCor1(phi = 0, npergrp = 100, seed = 50)
ar1mod <- glmmTMB(y~1 + (1|f) + ar1(tt-1|f), data=d, family=gaussian)
simpmod <- glmmTMB(y~1 + (1|f), data=d, family=gaussian)
Even though phi is estimated to be effectively 0 by ar1mod
and therefore concluding that the ar1 component of the model is effectively useless, the predictions between the two models seem wildly different:
d$errar1 <- d$y - predict(ar1mod)
d$errsimp <- d$y - predict(simpmod)
sqrt(mean(d$errar1 ^ 2))
## 0.0005218347
sqrt(mean(d$errsimp ^ 2))
## 0.9983432
2) In seeking to understand why tt
is used as a random slope, I tried to recreate the calculation that glmmTMB
must be doing to determine its estimate of phi
. I thought that, perhaps, by taking the mean coefficient of tt
at each time interval, and then determining the autocorrelation between the lagged tt
values, we'd arrive at the glmmTMB
estimate of the phi
parameter. Using mod
from earlier to try to recreate the estimated .85 parameter:
MeanTT <- colMeans(ranef(mod)$cond$f)[-1]
MeanTTlag1 <- MeanTT[-1]
MeanTTlag0 <- MeanTT[-length(MeanTT)]
cor(MeanTTlag1, MeanTTlag0)
## 0.430239
Well short of the 0.85. Why does this calculation not arrive at the 0.85 value and what is glmmTMB
instead doing to calculate phi
?