My question concerns the setup of a mixed model, which is designed to investigate the interaction between two different interventions over time. I will provide a sample data-set in R, but I am generally happy about any theoretical advice.
This is the scenario. It's a repeated measures dataset with two different interventions (food and training), each of them have two levels. All participants underwent each combination of conditions. Hormone samples were collected over the course of the day. Age and BMI are included as covariates.
This is a reproducible data set:
ID <- rep(rep(letters[1:16],each=8),4)
TIME <- rep(rep(paste(sprintf("%02i",9:16),"00",sep=":"),16),4)
training <- rep(rep(rep(c("T1","T2"),each=8),each=16),2)
food <- c(rep(rep(c("F1","F2"),each=8),each=16),rep(rep(c("F2","F1"),each=8),each=16))
hormone <- rnorm(n = 512,mean=5,sd=2)
BMI <- as.numeric(rep(rep(sample(18:35,size = 16,replace = T),each=8),4))
age <- as.numeric(rep(rep(sample(40:60,size = 16,replace=T),each=8),4))
DF <- data.frame(ID,TIME,training,food,hormone,BMI,age)
indNA <- which(DF$hormone%in%sample(DF$hormone,23))
indNABMI <- which(DF$ID%in%sample(DF$ID,2))
DF$BMI[indNABMI] <- NA
DF$hormone[indNA] <- NA
DF$BMI <- scale(DF$BMI)
DF$age <- scale(DF$age)
Now in order to model the effect of Time, I have found various solutions, as for example here: Is time of the day (predictor in regression) a categorical or a continuous variable? where it is suggested to include daytime as a circular variable. However, my sampling does not cover the entire day, so am I right in assuming this is not an option for me?
I would then argue for including it as an ordered factor variable as I did here:
DF$TIME <- ordered(DF$TIME)
str(DF$TIME)
Ord.factor w/ 8 levels "09:00"<"10:00"<..: 1 2 3 4 5 6 7 8 1 2 ...
Next, I am unsure about how to include this in my model. As I said, my research question is about the interaction of the 2 intervention types, so I would come up with the following model:
m1 <- lmer(hormone~training*food*TIME+age+BMI+(1+training*food*TIME|ID),DF)
which I would test against the according NULL-Model: (if you see any flaws in the model setup to begin with, please let me know)
m01 <- lmer(hormone~training*TIME+food*TIME+age+BMI+(1+training*TIME+food*TIME|ID),DF)
However I do not have enough observations to test this model and I therefore get this error-message:
Error: number of observations (=427) <= number of random effects (=448) for term (1 + training * food * TIME | ID); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable
and the NULL-Model does not converge either.
How can I alternatively include the effect of time in my random effects? I would especially like to include it, since hormones display diurnal fluctuations, which might again differ between participants but potentially also between conditions.
In addition, this dataset includes data from a pharmacological test, where I expect the hormone levels to rise and drop again over time. The shape would thus resemble a quadratic shape rather than a linear one?
Would I in that case, include the interaction of conditions and quadratic time (as a numerical predictor)?