Correct model formula to account for crossed and nested simultaneously?

Question

I set some models to test some specific contrasts. But I'm not sure which will correctly address the repeated measurements over my individuals.

The dataset has:

id = subject id, factor with 83 levels
tissue = factor with 2 levels (within-subj)
time = factor with 2 levels (within-subj)
group = factor with 2 levels (between-subj)
batch = factor with 2 levels

This is not completely balanced (crossed) neither for time, tissue, nor batch.

I'm interested in testing only the following contrasts:

disease_tuberculosis - disease_other in tissue_lung and time_before_treat
disease_tuberculosis - disease_other in tissue_blood and time_before_treat
tissue_lung - tissue_blood in disease_tuberculosis and time_before_treat
tissue_lung - tissue_blood in disease_other and time_before_treat
Time_after_treat - time_before_treat in tissue_lung and disease_tuberculosis

I tried some models that could give me coefficients that I could specify contrasts to estimate these differences above.

My problem is, I'm not sure which one addresses the repeated measures on subjects and results in correct degrees of freedom and, consequently, P-values. I'm referring to the df reported in emmeans after setting up my contrasts.

I'm inclined to use the most simple model, the one with a random intercept for id, but I'm afraid that the df might be inflated. Alternatively, I thought about throwing out the batch variable by taking the average by batch per id.

I also appreciate if you have any suggestions of books that help learn this subject further.

Here's my code:

set.seed(10)
library("nlme")
library("emmeans")

# download dataset into df1 object
source("https://pastebin.com/raw/inQJ2kXy")

fit1 <- lme(
        iv ~ batch + disease * tissue + time,
        random = ~ 1 | id,
        data = df1,
        na.action = na.exclude,
        weights = varIdent(form = ~ 1 | tissue))

fit2 <- lme(
        iv ~ batch + disease * tissue + time,
        random = ~ 1 | id / batch,
        data = df1,
        na.action = na.exclude,
        weights = varIdent(form = ~ 1 | tissue))

fit3 <- lme(
        iv ~ batch + disease * tissue + time,
        random = ~ 1 | id / batch / tissue,
        data = df1,
        na.action = na.exclude,
        weights = varIdent(form = ~ 1 | tissue))

fit4 <- lme(
        iv ~ batch + disease * tissue + time,
        random = ~ 1 | id / batch / tissue / time,
        data = df1,
        na.action = na.exclude,
        weights = varIdent(form = ~ 1 | tissue))

fit5 <- lme(
        iv ~ batch + disease * tissue + time,
        random = ~ disease * tissue | id,
        data = df1,
        na.action = na.exclude,
        weights = varIdent(form = ~ 1 | tissue))

anova(fit1, fit2, fit3, fit4, fit5)

# Using emmeans to get specific contrasts and adjust for multiple comparisons with MVT

# contrast data.frame
cont_oi <-
        data.frame(
                "tub-other_lung_BF" = c(0, 0, 0, 0, 1, -1, 0, 0),
                "tub-other_blood_BF" = c(0, 0, 0, 0, 0, 0, 1, -1),
                "lung-blood_tuber_BF" = c(0, 0, 0, 0, 1, -1, 0, 0),
                "lung-blood_other_BF" = c(0, 0, 0, 0, 0, 1, 0, -1),
                "AT-BT_blood_tub" = c(0, 0, 1, 0, 0, 0, -1, 0)
        )

# apply contrasts and ajust P-values using the ~exact mvt
# (multivariate t distribution)

res <- emmeans(fit1, ~disease+tissue+time, 
               data = df1,
               contr = cont_oi,
               adjust = "mvt")

res$contrasts

# results are averaged between batch levels

Am I understanding this correctly? You have 159 observations of `iv`, the outcome variable. It refers to a specific `tissue`, which is nested within a person `id`. And then each observation is also nested within a `batch`? Also, see my (and the other great) answers here: https://stats.stackexchange.com/questions/275450/when-to-use-mixed-effect-model/275460#275460 — Mark White, Nov 19 '21 at 04:34
Thank you, Mark. Yes, that's it. I have two tissues and two times for the same individual. The whole experiment is partially distributed into two batches (partial confounding with the time). I received another suggestion of separating this into two models. One estimating the effects for tissue and disease and their interaction. Another for estimating just the time effect (subseted to only the disease and tissue that has both time points). — BioLeal, Nov 19 '21 at 09:26

Correct model formula to account for crossed and nested simultaneously?

0 Answers0