I am analysing a longitudinal dataset of drug use in R code. Which contain the dependent variable: use
measured twice which I want to regress on several independent variables, to see if they can predict changes in use over the two timepoints.
The 3 time-invariant independent variables are age
, sex
and ´bias´
, which is a measure of implicit cognitive bias to the drug. I have a hypothesis that bias can predict changes in drug use.
In assessing the need for a multilevel model I have used the book: Applied longitudinal data analysis by Singer & Willet (avaliable here and : relevant R code). Here the authors suggest that before I regress on predictors I assess "whether there is hope for future analyses" by fiting two unconditional models that: partition and quantify the outcome variation in two important ways: first, across people without regard to time (the unconditional means model), and second, across both people and time (the unconditional growth model).As I understand it, the second model should be a better fit if there is considerable systematic variation in your dependent variable that is worth exploring with predictors.
I fit these models using the R package:nlme
like so :
model.a <- lme(use ~ 1, random= ~1|id, data = df)
model.b <‐ lme(use ~ time, random= ~time|id, data= df)
In Singer & Willet's example data their second model.b
provides a drop in level-1 residual deviance and their AIC & BIC also drops compared to model.a
, indicating a better fit: shown here in their R code.
This is in contrast to my models where AIC & BIC increases
> anova(model.a,model.b)
Model df AIC BIC logLik Test L.Ratio p-value
model.a 1 3 1058.395 1068.320 -526.1975
model.b 2 6 1064.356 1084.176 -526.1781 1 vs 2 0.03872928 0.998
As I understand Singer & Willet it is arguable inappropriate to model this using multilevel models, because of the decrease in fit, but I am not sure I understand why.
Question 1: Why is it inappropriate to model this using multilevel modelling?
Question 2: Is the second model not a better fit because I only have two timepoints for the dependent variable in my dataset, and Singer & Willet example has three timepoints?.
The most relevant CV thread I've found is this: Under what conditions should one use multilevel/hierarchical analysis?, But I have not found any answers that satistify this specific case.
Thanks for Reading!