7

I've been working through a reproducible example to better understand AR1 covariance matrix using the glmmTMB package. I have a couple of questions, even if only one can be answered I would greatly appreciate it.

Here is some synthetic data, many thanks to Ben Bolker for this code:

library(glmmTMB)
library(MASS)    
simCor1 <- function(phi=0.8,sdgrp=2,sdres=1,
                    npergrp=20,ngrp=20,
                    seed=NULL,
                    ## set linkinv/simfun for GLMM sims
                    linkinv=identity,
                    simfun=identity) {
  if (!is.null(seed)) set.seed(seed)
  cmat <- sdres*phi^abs(outer(0:(npergrp-1),0:(npergrp-1),"-"))
  errs <- MASS::mvrnorm(ngrp,mu=rep(0,npergrp),Sigma=cmat)
  ranef <- rnorm(ngrp,mean=0,sd=sdgrp)
  d <- data.frame(f=rep(1:ngrp,each=npergrp))
  eta <- ranef[as.numeric(d$f)] + c(t(errs)) ## unpack errors by row
  mu <- linkinv(eta)
  d$y <- simfun(mu)
  d$tt <- factor(rep(1:npergrp,ngrp))
  return(d)
}
d <- simCor1(phi = 0.8, seed = 100)

Here is a basic AR1 model in glmmTMB:

mod <- glmmTMB(y~1 + (1|f) + ar1(tt-1|f), data=d,family=gaussian)

On analysing the model fit, it does a good job of identifying the phi parameter with an estimate of 0.85:

summary(mod)
...    
Conditional model:
 Groups   Name        Variance Std.Dev. Corr      
 f        (Intercept) 4.15645  2.0387             
 f.1      tt1         1.06157  1.0303   0.85 (ar1)
 Residual             0.04183  0.2045             
Number of obs: 400, groups:  f, 20
...

Here are my questions:

1) Why does the package notation require time tt as a random slope for ar1() to work? One consequence of this seems to be when I use the predict() function it seems to have the retrospective benefit of knowing how each group performed at each time interval, leading to predictions very close to the actual value y. Using a reproducible example where phi=0:

d <- simCor1(phi = 0, npergrp = 100, seed = 50)
ar1mod <- glmmTMB(y~1 + (1|f) + ar1(tt-1|f), data=d, family=gaussian)
simpmod <- glmmTMB(y~1 + (1|f), data=d, family=gaussian)

Even though phi is estimated to be effectively 0 by ar1mod and therefore concluding that the ar1 component of the model is effectively useless, the predictions between the two models seem wildly different:

d$errar1 <- d$y - predict(ar1mod)
d$errsimp <- d$y - predict(simpmod)
sqrt(mean(d$errar1 ^ 2))
## 0.0005218347
sqrt(mean(d$errsimp ^ 2))
## 0.9983432

2) In seeking to understand why tt is used as a random slope, I tried to recreate the calculation that glmmTMB must be doing to determine its estimate of phi. I thought that, perhaps, by taking the mean coefficient of tt at each time interval, and then determining the autocorrelation between the lagged tt values, we'd arrive at the glmmTMB estimate of the phi parameter. Using mod from earlier to try to recreate the estimated .85 parameter:

MeanTT <- colMeans(ranef(mod)$cond$f)[-1]
MeanTTlag1 <- MeanTT[-1]
MeanTTlag0 <- MeanTT[-length(MeanTT)]
cor(MeanTTlag1, MeanTTlag0)
## 0.430239

Well short of the 0.85. Why does this calculation not arrive at the 0.85 value and what is glmmTMB instead doing to calculate phi?

Will T-E
  • 429
  • 1
  • 6
  • 14

0 Answers0