Background: I'm currently doing some work comparing various Bayesian hierarchical models. The data $y_{ij}$ are numeric measures of well-being for participant $i$ and time $j$. I have around 1000 participants and 5 to 10 observations per participant.
Like with most longitudinal datasets, I am expecting to see some form of auto-correlation whereby observations that are closer in time have a greater correlation than those that are further apart. Simplifying a few things, the basic model is as follows:
$$y_{ij} \sim N(\mu_{ij}, \sigma^2)$$
where I am comparing a no lag model:
$$\mu_{ij} = \beta_{0i}$$
with a lag model:
$$\mu_{ij} = \beta_{0i} + \beta_{1} (y_{i(j-1)} - \beta_{0i}) $$
where $\beta_{0i}$ is a person-level mean and $\beta_1$ is the lag parameter (i.e., the lag effect adds a multiple of the deviation of the observation from the previous time point from the predicted value of that time point). I've also had to do a few things to estimate $y_{i0}$ (i.e., observation prior to the first observation).
The results I am getting indicate that:
- The lag parameter is around .18, 95% CI [ .14, .21]. I.e., it's non-zero
- Mean deviance and the DIC both increase by several hundred when the lag is included in the model
- Posterior predictive checks show that by including the lag effect, the model is better able to recover the auto-correlation in the data
So in summary, the non-zero lag parameter and the posterior predictive checks suggest the lag model is better; yet mean deviance and DIC suggest that the no lag model is better. This puzzles me.
My general experience is that if you add a useful parameter it should at least reduce the mean deviance (even if after a complexity penalty the DIC is not improved). Furthermore, a value of zero for the lag parameter would achieve the same deviance as the no lag model.
Question
Why might adding a lag effect increase mean deviance in a Bayesian hierarchical model even when the lag parameter is non zero and it improves posterior predictive checks?
Initial thoughts
- I've done a lot of convergence checks (e.g., looking at traceplots; examining variation in deviance results across chains and across runs) and both models seem to have converged on the posterior.
- I've done a code check where I forced the lag effect to be zero, and this did recover the no lag model deviances.
- I also looked at mean deviance minus the penalty which should yield deviance at expected values, and these also made the lag model appear worse.
- Perhaps the lag effect reduces the effective number of observations per person which reduces the certainty in estimating the person level means ($\beta_{0i}$) which increases deviance.
- Perhaps there is some issue with how I've estimated the implied time point before the first observation.
- Perhaps the lag effect is just weak in this data
- I tried estimating the model using a maximum liklihood using
lme
withcorrelation=corAR1()
. The estimate of the lag parameter was very similar. In this case the lag model had a larger log likelihood and a smaller AIC (by about 100) than one without a lag (i.e., it suggested the lag model was better). So this reinforced the idea that adding the lag should also lower the deviance in the Bayesian model. - Perhaps there is something special about Bayesian residuals. If the lag model uses the difference between predicted and actual y at the previous time point, then this quantity is going to be uncertain. Thus, the lag effect will be operating over a credible interval of such residual values.