I have the following data:
data <- structure(list(sample = 1:8, time1 = c(0.52, 0.5, 0.48, 0.4,
0.36, 0.3, 0.28, 0.28), time2 = c(0.53, 0.51, 0.48, 0.41, 0.36,
0.32, 0.3, 0.29)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8"))
and its long form:
d_long <- structure(list(sample = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L), time = structure(c(2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("time1",
"time2"), class = "factor"), value = c(0.52, 0.5, 0.48, 0.4,
0.36, 0.3, 0.28, 0.28, 0.53, 0.51, 0.48, 0.41, 0.36, 0.32, 0.3,
0.29)), row.names = c(NA, -16L), class = "data.frame")
For simple illustration, let's say I want to "mimic" a paired t test, but actually I will run a longitudinal analysis with more times, like t1...tn. Now we have t1 and t2.
> t.test(data$time1, data$time2, paired = T)
Paired t-test
data: data$time1 and data$time2
t = -3.7417, df = 7, p-value = 0.007247
...
lme does it easily, recognizes, that each sample has 2 measurements, so the DF=7 at each time points:
> summary(lme(value ~ time, random= ~1|sample, d_long))
Linear mixed-effects model fit by REML
Data: d_long
AIC BIC logLik
-47.56364 -45.00741 27.78182
Random effects:
Formula: ~1 | sample
(Intercept) Residual
StdDev: 0.09841603 0.005345225
Fixed effects: value ~ time
Value Std.Error DF t-value p-value
(Intercept) 0.395 0.03482097 7 11.343739 0.0000
time1 0.005 0.00133631 7 3.741657 0.0072
But gls() does not: denom. df = 14, not 7!
> anova(gls(value ~ time, correlation = corCompSymm(form = ~1|sample), d_long))
Denom. DF: 14
numDF F-value p-value
(Intercept) 1 128.6804 <.0001
time 1 14.0000 0.0022
I want to use gls for modelling a longitudinal data, as it allows me to use various correlation structures without specifying random effects, like SAS does. I do not want to use mixed models, as random intercept model IS NOT equivalent to compound symmetry, I often get negative correlations, and don't want them to be made 0 in lme4.
It seems that GLS ignores the "clustering" and only models the covariance among samples, leaving the DF as they are...
Is there any work around? I know I can use differences (t2 - t1) and provide it to gls, but I noticed that lme doesn't need that to infer the DF properly. And what if I have more time points, say 3 or 5? I am afraid gls() will do what is called "fake data replication" in determining the p-values and CIs.