1

I have the following data:

data <- structure(list(sample = 1:8, time1 = c(0.52, 0.5, 0.48, 0.4, 
0.36, 0.3, 0.28, 0.28), time2 = c(0.53, 0.51, 0.48, 0.41, 0.36, 
0.32, 0.3, 0.29)), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8"))

and its long form:

d_long <- structure(list(sample = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L), time = structure(c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("time1", 
"time2"), class = "factor"), value = c(0.52, 0.5, 0.48, 0.4, 
0.36, 0.3, 0.28, 0.28, 0.53, 0.51, 0.48, 0.41, 0.36, 0.32, 0.3, 
0.29)), row.names = c(NA, -16L), class = "data.frame")

For simple illustration, let's say I want to "mimic" a paired t test, but actually I will run a longitudinal analysis with more times, like t1...tn. Now we have t1 and t2.

> t.test(data$time1, data$time2, paired = T) 

    Paired t-test

data:  data$time1 and data$time2
t = -3.7417, df = 7, p-value = 0.007247
...

lme does it easily, recognizes, that each sample has 2 measurements, so the DF=7 at each time points:

> summary(lme(value ~ time,  random= ~1|sample, d_long))
Linear mixed-effects model fit by REML
 Data: d_long 
        AIC       BIC   logLik
  -47.56364 -45.00741 27.78182

Random effects:
 Formula: ~1 | sample
        (Intercept)    Residual
StdDev:  0.09841603 0.005345225

Fixed effects: value ~ time 
            Value  Std.Error DF   t-value p-value
(Intercept) 0.395 0.03482097  7 11.343739  0.0000
time1       0.005 0.00133631  7  3.741657  0.0072

But gls() does not: denom. df = 14, not 7!

> anova(gls(value ~ time,  correlation = corCompSymm(form = ~1|sample), d_long))
Denom. DF: 14 
            numDF  F-value p-value
(Intercept)     1 128.6804  <.0001
time            1  14.0000  0.0022

I want to use gls for modelling a longitudinal data, as it allows me to use various correlation structures without specifying random effects, like SAS does. I do not want to use mixed models, as random intercept model IS NOT equivalent to compound symmetry, I often get negative correlations, and don't want them to be made 0 in lme4.

It seems that GLS ignores the "clustering" and only models the covariance among samples, leaving the DF as they are...

Is there any work around? I know I can use differences (t2 - t1) and provide it to gls, but I noticed that lme doesn't need that to infer the DF properly. And what if I have more time points, say 3 or 5? I am afraid gls() will do what is called "fake data replication" in determining the p-values and CIs.

  • Note this is a follow up to a previous question at https://stats.stackexchange.com/questions/486812/is-there-any-way-to-get-correct-degrees-of-freedom-in-gls-matching-those-of-pai – JTH Sep 09 '20 at 21:03
  • Yes, I followed your advice and started a new thread. Thank you for your help in the linked topic! – GibbsSampler10 Sep 09 '20 at 21:05
  • Why do you expect the degrees of freedom to be the same? The `gls` modeling has a different basis than the t-test. – EdM Sep 10 '20 at 17:43
  • But mixed model handled it properly, so I thought this is important. That's exactly why we use these models for repeated observations - to not create "fake replications". So, if a mixed model correctly recognized it, shouldn't GLS, which is - as I was told - used commonly for such kind of analysis - do the same? Instead, GLS treats all the data as if it was unpaired. But I thought I might get it wrong, but if so, why it's justified to used it for such analysis? https://stats.stackexchange.com/questions/486946/is-it-ok-to-use-gls-generalized-least-square-to-analyze-repeated-data-what-ab – GibbsSampler10 Sep 10 '20 at 19:23
  • Mixed model also seems to have different basis than the t-test, yet it reproduced the results perfectly. That's why I got confused about gls, which didn't "guess" that each subject was examined twice (or more times). For this reason I asked my second question I cited in my previous comment. – GibbsSampler10 Sep 10 '20 at 19:25

0 Answers0