Let's assume I have the following data set, consisting of pre (t1) and post (t2) measurements. I want to check, if there is a significant effect over time (t1, t2).
d <- structure(list(pre = c(1, 2, 2.3, 1.3, 4.4, 3.4, 5.3, 2.2, 4.4,
3.3, 5.5, 7, 8.8, 3.3, 6, 7.7, 2.2), post = c(2, 4.3, 7.1, 8.1,
5.3, 4.3, 3.2, 5.7, 6.5, 4.4, 6.6, 7, 9.9, 6.6, 6, 6.6, 8.8),
diff = c(1, 2.3, 4.8, 6.8, 0.899999999999999, 0.9, -2.1,
3.5, 2.1, 1.1, 1.1, 0, 1.1, 3.3, 0, -1.1, 6.6)), .Names = c("pre",
"post", "diff"), row.names = c(NA, -17L), class = "data.frame")
It was tempting to use the paired t.test:
> t.test(d$pre, d$post, paired = TRUE)
Paired t-test
data: d$pre and d$post
t = -3.1952, df = 16, p-value = 0.005634
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.1605916 -0.6394084
sample estimates:
mean of the differences
-1.9
Then I quickly checked, whether I can reproduce this using lm()
> summary(lm( (post-pre) ~ 1, data=d))
Call:
lm(formula = (post - pre) ~ 1, data = d)
Residuals:
Min 1Q Median 3Q Max
-4.0 -1.0 -0.8 1.4 4.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.9000 0.5946 3.195 0.00563 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.452 on 16 degrees of freedom
OK. It works. Now, I wanted to adjust for the pre value, as it is accepted in analysing data from observational trials. I cannot use t.test any more, so I switched to lm(). First, I switched to the long format (no third-party packages):
dl <- reshape(data=d, idvar="subj", varying = c("pre", "post"), times=c("pre", "post"), v.name=c("value"), direction="long", new.row.names = 1:100)
With result:
> rbind(head(dl,3), tail(dl,3))
diff time value subj
1.pre 1.0 pre 1.0 1
2.pre 2.3 pre 2.0 2
3.pre 4.8 pre 2.3 3
15.post 0.0 post 6.0 15
16.post -1.1 post 6.6 16
17.post 6.6 post 8.8 17
Now I also added the "pre" to the data.frame
dl$pre <- c(d$pre, d$pre)
And did this:
> summary(lme4::lmer(value ~ time + pre + (1|subj), data=dl))
singular fit
Linear mixed model fit by REML ['lmerMod']
Formula: value ~ time + pre + (1 | subj)
Data: dl
REML criterion at convergence: 126.5
Scaled residuals:
Min 1Q Median 3Q Max
-2.2960 -0.5785 -0.1819 0.4949 2.5783
Random effects:
Groups Name Variance Std.Dev.
subj (Intercept) 1.097e-15 3.313e-08
Residual 2.449e+00 1.565e+00
Number of obs: 34, groups: subj, 17
Fixed effects:
Estimate Std. Error t value
(Intercept) 3.3254 0.6244 5.326
timepre -1.9000 0.5368 -3.539
pre 0.6543 0.1202 5.442
Correlation of Fixed Effects:
(Intr) timepr
timepre -0.430
pre -0.794 0.000
convergence code: 0
singular fit
Then, I calculated the CI for the mean difference (mm holds the result of lmer):
> confint(mm, method="Wald")
2.5 % 97.5 %
.sig01 NA NA
.sigma NA NA
(Intercept) 2.1016860 4.5491895
timepre -2.9521169 -0.8478831
pre 0.4186819 0.8899503
The CI for "timepre" is shorter than the CI returned by the paired t.test, which is expected. The estimate for the difference is (ignoring the sign) 1.9, t is now much larger (I believe due to pre included).
Now, my question is, do I do it properly? I need just "paired t.test adjusted for pre" and a CI.
By the way, in clinical research, often data like this is summarized by: mean_t1, mean_t2, difference, CI_for_difference, CI_adj_for_pre. But how can we report the difference of means and report the CI for that, if these are paired data? This is rather mean of differences, isn't it?