5

Let's assume I have the following data set, consisting of pre (t1) and post (t2) measurements. I want to check, if there is a significant effect over time (t1, t2).

d <- structure(list(pre = c(1, 2, 2.3, 1.3, 4.4, 3.4, 5.3, 2.2, 4.4, 
3.3, 5.5, 7, 8.8, 3.3, 6, 7.7, 2.2), post = c(2, 4.3, 7.1, 8.1, 
5.3, 4.3, 3.2, 5.7, 6.5, 4.4, 6.6, 7, 9.9, 6.6, 6, 6.6, 8.8), 
    diff = c(1, 2.3, 4.8, 6.8, 0.899999999999999, 0.9, -2.1, 
    3.5, 2.1, 1.1, 1.1, 0, 1.1, 3.3, 0, -1.1, 6.6)), .Names = c("pre", 
"post", "diff"), row.names = c(NA, -17L), class = "data.frame")

It was tempting to use the paired t.test:

> t.test(d$pre, d$post, paired = TRUE)

    Paired t-test

data:  d$pre and d$post
t = -3.1952, df = 16, p-value = 0.005634
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.1605916 -0.6394084
sample estimates:
mean of the differences 
                   -1.9

Then I quickly checked, whether I can reproduce this using lm()

> summary(lm( (post-pre) ~ 1, data=d))

Call:
lm(formula = (post - pre) ~ 1, data = d)

Residuals:
   Min     1Q Median     3Q    Max 
  -4.0   -1.0   -0.8    1.4    4.9 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   1.9000     0.5946   3.195  0.00563 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.452 on 16 degrees of freedom

OK. It works. Now, I wanted to adjust for the pre value, as it is accepted in analysing data from observational trials. I cannot use t.test any more, so I switched to lm(). First, I switched to the long format (no third-party packages):

dl <- reshape(data=d, idvar="subj", varying = c("pre", "post"), times=c("pre", "post"), v.name=c("value"), direction="long", new.row.names = 1:100)

With result:

> rbind(head(dl,3), tail(dl,3))
        diff time value subj
1.pre    1.0  pre   1.0    1
2.pre    2.3  pre   2.0    2
3.pre    4.8  pre   2.3    3
15.post  0.0 post   6.0   15
16.post -1.1 post   6.6   16
17.post  6.6 post   8.8   17

Now I also added the "pre" to the data.frame

dl$pre <- c(d$pre, d$pre)

And did this:

> summary(lme4::lmer(value ~ time + pre + (1|subj), data=dl))
singular fit
Linear mixed model fit by REML ['lmerMod']
Formula: value ~ time + pre + (1 | subj)
   Data: dl

REML criterion at convergence: 126.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.2960 -0.5785 -0.1819  0.4949  2.5783 

Random effects:
 Groups   Name        Variance  Std.Dev. 
 subj     (Intercept) 1.097e-15 3.313e-08
 Residual             2.449e+00 1.565e+00
Number of obs: 34, groups:  subj, 17

Fixed effects:
            Estimate Std. Error t value
(Intercept)   3.3254     0.6244   5.326
timepre      -1.9000     0.5368  -3.539
pre           0.6543     0.1202   5.442

Correlation of Fixed Effects:
        (Intr) timepr
timepre -0.430       
pre     -0.794  0.000
convergence code: 0
singular fit

Then, I calculated the CI for the mean difference (mm holds the result of lmer):

> confint(mm, method="Wald")
                 2.5 %     97.5 %
.sig01              NA         NA
.sigma              NA         NA
(Intercept)  2.1016860  4.5491895
timepre     -2.9521169 -0.8478831
pre          0.4186819  0.8899503

The CI for "timepre" is shorter than the CI returned by the paired t.test, which is expected. The estimate for the difference is (ignoring the sign) 1.9, t is now much larger (I believe due to pre included).

Now, my question is, do I do it properly? I need just "paired t.test adjusted for pre" and a CI.

By the way, in clinical research, often data like this is summarized by: mean_t1, mean_t2, difference, CI_for_difference, CI_adj_for_pre. But how can we report the difference of means and report the CI for that, if these are paired data? This is rather mean of differences, isn't it?

Robert Long
  • 53,316
  • 10
  • 84
  • 148
humbleasker
  • 257
  • 2
  • 8

1 Answers1

5

What you have done here is correct ! The confidence interval is smaller due to conditioning on baseline. The mistake a lot of people make is to regress the change on baseline which is very bad due to mathematical coupling which can invoke bias.

There are quite a few questions on Cross Validated with answers that go into this is depth. For example:
Is there a mathematical proof for change being correlated with baseline value

Also related is my answer here:
Difference between Repeated measures ANOVA, ANCOVA and Linear mixed effects model

Also, specifically regarding ANCOVA, see my answer here:
Is ANCOVA correct here?

For more background and further detail this is a very good paper on the topic:
Tu YK, Gilthorpe MS. Revisiting the relation between change and initial value: a review and evaluation. Stat Med. 2007;26(2):443-457. doi:10.1002/sim.2538
https://pubmed.ncbi.nlm.nih.gov/16526009/

Robert Long
  • 53,316
  • 10
  • 84
  • 148
  • "which is very bad due to mathematical coupling which can invoke bias" It could be very good to expand this statement with more reasoning or references. – ttnphns Aug 04 '20 at 13:24
  • Asking, because there exists also opposite statement that it is sane to regress (or ANCOVA) the _change_ (the difference) on baseline. – ttnphns Aug 04 '20 at 13:28
  • 1
    @ttnphns ANCOVA can also be biased if subjects are not randomised to the groups. I wrote a long answer on this, with references, not long ago. I will try to find it and add a link to my answer. Thanks ! – Robert Long Aug 04 '20 at 13:43
  • 1
    Thanks. If you collect even more links, relevant to the baseline controversy, bring them in either. Thanks again. – ttnphns Aug 04 '20 at 14:37
  • Change from baseline conditioned on baseline is common in clinical biostatistics and even mentioned by regulations. https://stats.stackexchange.com/questions/459279/is-there-any-formal-guideline-that-would-indicate-the-necessity-for-adjustment-f – Bastian Sep 25 '21 at 19:33
  • @Bastian indeed, it's a source of great frustration. Along with the continued (mis)use of p-values/confidence intervals, reliance on incorrect assumptoions, etc – Robert Long Sep 26 '21 at 10:19
  • You are right, the industry is entirely based on the NHST. Bayesian approach is used here and there, mostly at trial design and other supportive analyses, but the final submissions are in big % done with p-values. "Outliers" happen, for example Pfizer submitted their vaccine using the Bayesian framework, but it's rather exception than rule. The good news is that recently the clinical significance must be included in the analysis, so findings with p<0.000001 are ignored if the minimum clinically importance is not met, as statistical artefact - statistically detectable but unimportant. – Bastian Sep 27 '21 at 10:47