How to include time factor in a mixed model (investigating the interaction between two factors and time)?

Question

My question concerns the setup of a mixed model, which is designed to investigate the interaction between two different interventions over time. I will provide a sample data-set in R, but I am generally happy about any theoretical advice.

This is the scenario. It's a repeated measures dataset with two different interventions (food and training), each of them have two levels. All participants underwent each combination of conditions. Hormone samples were collected over the course of the day. Age and BMI are included as covariates.

This is a reproducible data set:

ID <- rep(rep(letters[1:16],each=8),4)
TIME <- rep(rep(paste(sprintf("%02i",9:16),"00",sep=":"),16),4)
training <- rep(rep(rep(c("T1","T2"),each=8),each=16),2)
food <- c(rep(rep(c("F1","F2"),each=8),each=16),rep(rep(c("F2","F1"),each=8),each=16))
hormone <- rnorm(n = 512,mean=5,sd=2)
BMI <- as.numeric(rep(rep(sample(18:35,size = 16,replace = T),each=8),4))
age <- as.numeric(rep(rep(sample(40:60,size = 16,replace=T),each=8),4))
DF <- data.frame(ID,TIME,training,food,hormone,BMI,age)

indNA <- which(DF$hormone%in%sample(DF$hormone,23))
indNABMI <- which(DF$ID%in%sample(DF$ID,2))
DF$BMI[indNABMI] <- NA
DF$hormone[indNA] <- NA

DF$BMI <- scale(DF$BMI)
DF$age <- scale(DF$age)

Now in order to model the effect of Time, I have found various solutions, as for example here: Is time of the day (predictor in regression) a categorical or a continuous variable? where it is suggested to include daytime as a circular variable. However, my sampling does not cover the entire day, so am I right in assuming this is not an option for me?

I would then argue for including it as an ordered factor variable as I did here:

DF$TIME <- ordered(DF$TIME)
str(DF$TIME)
Ord.factor w/ 8 levels "09:00"<"10:00"<..: 1 2 3 4 5 6 7 8 1 2 ...

Next, I am unsure about how to include this in my model. As I said, my research question is about the interaction of the 2 intervention types, so I would come up with the following model:

m1 <- lmer(hormone~training*food*TIME+age+BMI+(1+training*food*TIME|ID),DF)

which I would test against the according NULL-Model: (if you see any flaws in the model setup to begin with, please let me know)

m01 <- lmer(hormone~training*TIME+food*TIME+age+BMI+(1+training*TIME+food*TIME|ID),DF)

However I do not have enough observations to test this model and I therefore get this error-message:

Error: number of observations (=427) <= number of random effects (=448) for term (1 + training * food * TIME | ID); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable

and the NULL-Model does not converge either.

How can I alternatively include the effect of time in my random effects? I would especially like to include it, since hormones display diurnal fluctuations, which might again differ between participants but potentially also between conditions.

In addition, this dataset includes data from a pharmacological test, where I expect the hormone levels to rise and drop again over time. The shape would thus resemble a quadratic shape rather than a linear one?

Would I in that case, include the interaction of conditions and quadratic time (as a numerical predictor)?

score 3 · Accepted Answer · answered Jan 31 '18 at 16:44

There are two mostly separate issues here: (1) how to model time, and (2) how to incorporate among-individual variation in response to time. The connection is that modeling time in a way that uses too many parameters will mean that you won't have enough data to model among-individual variation, because a general model of the variation of an effect that requires $n$ parameters to describe will require $n(n+1)/2$ variance-covariance parameters.

The fact that your data are not taken over the entire 24 hours actually makes things a little easier, as you don't have to worry about enforcing continuity (and possibly smoothness) at the boundary between days. Some of your choices, given that you have 8 distinct time points from 0900 to 1600:

unordered factor: 8 parameters, (8*9)/2 = 36 variance-covariance parameters
ordered factor: this would have the same complexity as an unordered factor, give the same overall goodness-of-fit, etc.; the only difference is in the way the model is parameterized
sinusoidal variation (as in the linked question): 2 parameters, 3 var-cov parameters
circular splines - adjustable complexity

However, especially if your number of subjects is limited to 16 or that approximate order of magnitude, I would recommend a simple linear model - that is, treat time as continuous and assume that there is a simple linear trend. If something more complex is going on (which you might detect when looking at residual plots), you could use a quadratic model or a regression spline.

How to include time factor in a mixed model (investigating the interaction between two factors and time)?

1 Answers1

Related