I have data that looks like this: there is a group of 27 subjects with one dichotomous variable y1 at 3 times points. The probability of y1 is different between the 3 time points (100%, 85%, and 40% respectively). I want to prove that ther is a significant difference between each of the time points (between t1-t2, t1-t3 and t2-t3).
For this I would like to use a logistic regression (ideally a mixed model). I have the impression that there is a statistical difference between all the time point. This impression is confirmed at least by a simple chi-square test looking individually at difference (one test for each pair t1-t2, t1-t3 and t2-t3). However I can't find a logistic model that proves this. Instead, the logistic models seem to tell that there are no difference between t1-t2 and t1-t3.
Here is first 1. the simulation of the data, 2. the chi-square test, 3. fitting of a logistic mixed model, 4. fitting logistic regression 5. fitting logistic regression with a score test as pointed in this answer
1. Simulation of the data
library(dplyr)
library(multcomp)
funda = 25
y_sub1 = rbinom(funda, 1, 0.85)
y_sub2 = rbinom(funda, 1, 0.4)
y1 = c(rep(1, funda), y_sub1, y_sub2)
y1[2*funda + 3] = NA
time = c(rep("t1", funda), rep("t2", funda),
rep("t3", funda))
ID_init = c()
for (i in 1:funda){
ID_init = c(ID_init, paste0("ID", i))
}
ID = rep(ID_init, 3)
df = data.frame(y1, time, ID)
2. Chi square tests
df_t1_t2 = subset(df, time == "t1" | time == "t2")
chisq.test(df_t1_t2$y1, df_t1_t2$time)
Chi-squared approximation may be incorrect
Pearson's Chi-squared test with Yates' continuity correction
data: df_t1_t2$y1 and df_t1_t2$time
X-squared = 5.9801, df = 1, p-value = 0.01447
df_t1_t3 = subset(df, time == "t1" | time == "t3")
chisq.test(df_t1_t3$y1, df_t1_t3$time)
Pearson's Chi-squared test with Yates' continuity correction
data: df_t1_t3$y1 and df_t1_t3$time
X-squared = 19.672, df = 1, p-value = 9.193e-06
df_t2_t3 = subset(df, time == "t2" | time == "t3")
chisq.test(df_t2_t3$y1, df_t2_t3$time)
Pearson's Chi-squared test with Yates' continuity correction
data: df_t2_t3$y1 and df_t2_t3$time
X-squared = 4.5791, df = 1, p-value = 0.03236
3. Logistic mixed model
mod_glmer = glmer(y1 ~ time + (1|ID), data = df,
family = binomial)
mod_glmer = glht(mod_glmer, linfct = mcp(time = "Tukey"))
mod_glmer = summary(mod_glmer)
mod_glmer
Error in length(value <- as.numeric(value)) == 1L :
Downdated VtV is not positive definite
4. logistic regression:
mod_glm = glm(y1 ~ time, data = df, family = "binomial")
mod_glm <- glht(mod_glm, linfct = mcp(time = "Tukey")) %>%
summary()
mod_glm
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: glm(formula = y1 ~ time, family = "binomial",
data = df)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
t2 - t1 == 0 -17.1237 2150.8027 -0.008 0.999960
t3 - t1 == 0 -20.4534 2150.8026 -0.010 0.999942
t3 - t2 == 0 -3.3297 0.8632 -3.857 0.000229 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
**5. logistic regression:**
mod_glm = glm(y1 ~ time, data = df, family = "binomial")
mod_glm = anova(mod_glm, test = "Rao")
mod_glm
mod_glm = glht(mod_glm, linfct = mcp(time = "Tukey")) %>% summary()
mod_glm
Error in factor_contrasts(model) : no 'model.matrix' method
for 'model' found!