2

I have data that looks like this: there is a group of 27 subjects with one dichotomous variable y1 at 3 times points. The probability of y1 is different between the 3 time points (100%, 85%, and 40% respectively). I want to prove that ther is a significant difference between each of the time points (between t1-t2, t1-t3 and t2-t3).

For this I would like to use a logistic regression (ideally a mixed model). I have the impression that there is a statistical difference between all the time point. This impression is confirmed at least by a simple chi-square test looking individually at difference (one test for each pair t1-t2, t1-t3 and t2-t3). However I can't find a logistic model that proves this. Instead, the logistic models seem to tell that there are no difference between t1-t2 and t1-t3.

Here is first 1. the simulation of the data, 2. the chi-square test, 3. fitting of a logistic mixed model, 4. fitting logistic regression 5. fitting logistic regression with a score test as pointed in this answer

1. Simulation of the data

    library(dplyr)
    library(multcomp)

    funda = 25
    y_sub1 = rbinom(funda, 1, 0.85)
    y_sub2 = rbinom(funda, 1, 0.4)
    y1 = c(rep(1, funda), y_sub1, y_sub2)
    y1[2*funda + 3] = NA
    time = c(rep("t1", funda), rep("t2", funda), 
              rep("t3", funda))
    ID_init = c()
    for (i in 1:funda){
      ID_init = c(ID_init, paste0("ID", i))
    }
    ID = rep(ID_init, 3)
    df = data.frame(y1, time, ID)

2. Chi square tests

    df_t1_t2 = subset(df, time == "t1" | time == "t2")
    chisq.test(df_t1_t2$y1, df_t1_t2$time)

    Chi-squared approximation may be incorrect
    Pearson's Chi-squared test with Yates' continuity correction

    data:  df_t1_t2$y1 and df_t1_t2$time
    X-squared = 5.9801, df = 1, p-value = 0.01447

    df_t1_t3 = subset(df, time == "t1" | time == "t3")
    chisq.test(df_t1_t3$y1, df_t1_t3$time)

    Pearson's Chi-squared test with Yates' continuity correction

    data:  df_t1_t3$y1 and df_t1_t3$time
    X-squared = 19.672, df = 1, p-value = 9.193e-06

    df_t2_t3 = subset(df, time == "t2" | time == "t3")
    chisq.test(df_t2_t3$y1, df_t2_t3$time)

    Pearson's Chi-squared test with Yates' continuity correction

    data:  df_t2_t3$y1 and df_t2_t3$time
    X-squared = 4.5791, df = 1, p-value = 0.03236

3. Logistic mixed model

    mod_glmer = glmer(y1 ~ time + (1|ID), data = df, 
                      family = binomial)
    mod_glmer = glht(mod_glmer, linfct = mcp(time = "Tukey"))
    mod_glmer = summary(mod_glmer)
    mod_glmer

    Error in length(value <- as.numeric(value)) == 1L : 
        Downdated VtV is not positive definite

4. logistic regression:

    mod_glm = glm(y1 ~ time, data = df, family = "binomial")
    mod_glm <- glht(mod_glm, linfct = mcp(time = "Tukey")) %>% 
                 summary()
    mod_glm

    Simultaneous Tests for General Linear Hypotheses

    Multiple Comparisons of Means: Tukey Contrasts

    Fit: glm(formula = y1 ~ time, family = "binomial", 
               data = df)

    Linear Hypotheses:
              Estimate Std. Error z value Pr(>|z|)    
    t2 - t1 == 0  -17.1237  2150.8027  -0.008 0.999960    
    t3 - t1 == 0  -20.4534  2150.8026  -0.010 0.999942    
    t3 - t2 == 0   -3.3297     0.8632  -3.857 0.000229 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    (Adjusted p values reported -- single-step method)

    **5. logistic regression:**
    mod_glm = glm(y1 ~ time, data = df, family = "binomial")
    mod_glm = anova(mod_glm, test = "Rao")
    mod_glm
    mod_glm = glht(mod_glm, linfct = mcp(time = "Tukey")) %>% summary()
    mod_glm

    Error in factor_contrasts(model) : no 'model.matrix' method 
      for 'model' found!
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
ecjb
  • 539
  • 1
  • 5
  • 16
  • Are you trying to compare probabilities of "succes" between pairs of time points (where "success" occurs whenever y1 = 1)? The models you have fitted are comparing log odds of "success" between pairs of time points, which is presumably not what you want. To compare probabilities of "success", you could use bootstrapping perhaps. Recall that the odds of success are defined as p/(1-p), where p is the probability of "success". – Isabella Ghement Sep 22 '18 at 21:27
  • Thank you for you comment @IsabellaGhement. I'm aware of that difference between probability and the logit(p) = log-odds(p) = $log(\frac{p}{1-p})$. What I'm more interested in is the difference of proportion of success / failure between 2 time point like the chi squared test above. Do you have an idea of a particular code? – ecjb Sep 22 '18 at 22:08

0 Answers0