Change Score or Regressor Variable Method - Should I regress $Y_1$ over $X$ and $Y_0$ or $(Y_1-Y_0)$ over $X$

Question

I have data about investment preferences 1 year before the Covid and during the Covid lockdown.

Some changes appear using simple T-Test. I want to be able to assess if these changes are particularly strong for some specific demographics (e.g., older individuals ($X_1$), individuals with lower income ($X_2$), etc...).

Should I use the initial level of my dependant variable in the regressions? Basically, if I want to use OLS regressions to investigate which independant variable correlate with the change in my dependant variable, which model is preferrable?

Model 1 (apparently called Change Score Method): $(Y_2-Y_1)= \beta_1 . X_1+ \beta_2 . X_2 $

Model 2 (apparently called Regressor Variable Method) Score Method): $Y_2= \beta_1 . X_1+ \beta_2 . X_2 + \beta_3 . Y_1 $

Thank you so much for your help - Any reference would also be much appreciated!

Maybe a dup: https://stats.stackexchange.com/questions/3466/best-practice-when-analysing-pre-post-treatment-control-designs — kjetil b halvorsen, Jul 21 '20 at 01:33

rnso · Accepted Answer · 2020-07-22T01:39:25.303

2

Both methods have been used. See here for example. It depends what question you want to answer. If you want to talk mostly about "change" you can use

(Y2-Y1) ~ X1 + X2            # (1)

Basal (Y1) should not be added to above equation as it will always be correlated with difference (Y2-Y1) - see comments below by @EdM and here.

On the other hand, if you want to discuss factors affecting "final value", you can use

Y2 ~ X1 + X2 + Y1            # (2)

However, since repeated measurements (Y1,Y2 at 2 times) have been done on same subject, hence mixed model is also often used. (including interactions as commented by @dbwilson below):

Y ~ X1 + X2 + time + X1*time + X2*time + (1|subject)

Following simplified version of formula is effectively same as above:

Y ~ X1*time + X2*time + (1|subject)            # (3)

There is another method commonly used, especially in biomedical literature: "Percent change", i.e.

(100*(Y2-Y1)/Y1) ~ X1 + X2            # (4)

It is not correct to keep Y1 as a predictor variable in this last method as there will be strong correlation between baseline and percent change.

I think this last method (percent change) is most understandable.

See here for more information on this topic.

edited Jul 22 '20 at 01:39

answered Jul 21 '20 at 01:16

rnso

8,893
14
50
94

Thank you so much for this detailed answer. In the end, given that I was mostly interested in change, I used (Y2-Y1) ~ X1 + X2 It is however interesting to see the last two methods you propose. Thank you again! – L. M. Jul 21 '20 at 10:30
Regressing the difference against the initial value is not a good idea. See [this answer](https://stats.stackexchange.com/a/476445/28500) and its links and [this answer](https://stats.stackexchange.com/a/476453/28500) to the question ["What are the worst (commonly adopted) ideas/principles in statistics?"](https://stats.stackexchange.com/q/476424/28500) – EdM Jul 21 '20 at 11:21
I have added a note regarding this in answer above. – rnso Jul 21 '20 at 11:30
In the mixed-model, the interaction between X1*time and X2*time are estimating the same effect as the X1 and X2 effects in the change score model. The code, however, should be Y~X1+X2+time+X1*time+X2*time + (1|subject). – dbwilson Jul 21 '20 at 11:55
I have added this in answer above with your reference. – rnso Jul 21 '20 at 12:35

Change Score or Regressor Variable Method - Should I regress $Y_1$ over $X$ and $Y_0$ or $(Y_1-Y_0)$ over $X$

1 Answers1

Linked