Modeling temporal change with 2 data points

Question

Study Context:

I am studying the relationship between a biomarker of cellular aging (telomere length) and menopause.

We have 2 datapoints for a subset of the women. Therefore, our participants could fall into the following categories:

Pre-menopausal at baseline and post-menopausal at follow-up
Post-menopausal at baseline and post-menopausal at follow-up
Only one measurement (baseline)
Only one measurement (follow-up)

We also have the timing of menopause, and when the samples were taken. Thus, one of our questions is "Does going through menopause (and how long since) affect the rate of change in telomere length between baseline and follow-up?"

Modeling:

I've come across 3 approaches that might work.

Autoregression
Difference Score
Linear mixed effects model (with participant as a random factor).

I've been convinced about the advantages of linear mixed effects models to model change over time. However, our data only has 2 time points. My understanding from Singer and Willett 2003 is that fitting LMEs to 2 time points doesn't really make sense (because the intra-individual correlation will always be 1 and the error 0). For this reason, I've sort of abandoned thoughts of using LMEs.

I found good introductions to autoregression and difference-score methods here. This led me to believe that the difference-score approach is superior, because it attempts to model questions about intraindividual change (as we trying to do).

However, there is published work showing that 'controlling' for baseline telomere length in delta models (autoregression and difference-score) overestimates non-zero estimates of measurement error and can lead to false positives (if the baseline values are already quite different). Based on this study, LMEs seem to perform well.

Stats questions:

Are LMEs a bad choice to model data with only two time points like this?
If LMEs are acceptable, how would I model the timing of menopause relative to the first and second measurement? Because each participant has 2 rows, I assume I would need to factor the timing of menopause into the model for both rows somehow.
If LMEs are not acceptable, is there a way to model deltas (change) without including baseline measurement in the model, which appears to bias estimates.

I've used nlme and lmer, so bonus for helping me figure this out using R!

A [paper you cite](https://royalsocietypublishing.org/doi/full/10.1098/rsos.190937) says "to avoid invalid inference, models of telomere attrition [telomere-length (TL) difference] should not control for baseline TL by including it as a covariate." That's a well known problem in general; see [this answer](https://stats.stackexchange.com/a/481452/28500) and its links. Regressing a _change_ against a baseline can be problematic. Regressing a final value against an initial value is OK. See [this question](https://stats.stackexchange.com/q/482664/28500), closely related to yours. — EdM, Oct 23 '21 at 21:41
Thanks for taking the time to comment @EdM. If I can model 2 time points with LMEs, do you have thoughts on question 2? (How would one model time difference, which is a single measure, when you have 2 rows for each participant)? Write the time difference once in each row (duplicated)? Or some other way I'm not thinking of? Also, any idea why it's so common to use (and [promote](https://quantdev.ssri.psu.edu/sites/qdev/files/05_TwoOccasionChange-WISC-_2020.html)) the difference-score model as a way to model intra-individual change? I may need to push back on co-authors/reviewers. — Calen, Oct 24 '21 at 17:54

EdM · Accepted Answer · 2021-10-25T14:22:04.277

You seek to model the rate of change (e.g., base pairs per year) of telomere length, and whether menopause affects that rate of change. So why not model the rate of change directly, taking into account the time elapsed pre- and post-menopause between measurements for each individual? Then you reduce your data set to just one row per individual and avoid most of the problems you otherwise anticipate.

There is absolutely no problem with modeling the intra-individual difference in telomere length as the outcome. That by itself provides a very useful control for the telomere length at first measurement, similar to a paired t-test. The problem is if you try to regress that difference against the initial telomere length as a predictor. So don't do that, despite what some web sites might promote. In your own field, that approach has already been shown to be invalid, as you note.*

To evaluate the rate of change, you include elapsed_time between measurements, in some form, as a predictor. Putting aside the menopause issue for the moment, a coefficient of elapsed_time, with length difference as the outcome, is then an estimate of the rate of change of telomere length over time.

To evaluate the role of menopause, you extend that approach to include separate predictors for pre-menopause_elapsed_time and post-menopause_elapsed_time. If the rate of change differs post menopause, then those coefficients should differ.

To this point, the model is simple:

telomere_length_difference ~ pre-menopause_elapsed_time + post-menopause_elapsed_time

Now, think about the intercept of this model. It will be the predicted telomere_length_difference with pre-menopause_elapsed_time and post-menopause_elapsed_time both at values of 0. Ideally, that intercept should thus be close to 0. If not, you need to rethink the assumptions underlying the model, for example whether telomere length decreases linearly with time both pre- and post-menopause. Or might menopause itself leads to something like a step-change in telomere length that needs also to be modeled?

Your ultimate model might also need to account for other factors associated with telomere length changes. That will require some care in how to model those other factors, whether to examine their interactions with either or both of the elapsed_time predictors, etc, based on your understanding of the subject matter. Depending on how you model the other predictors, the simple interpretation of the intercept presented above might not hold. But this simple model should get you pointed in a helpful direction.

*If someone think that changes occur proportionately to initial values, an assumption that might lead some to do such a regression of differences against initial values, perhaps modeling should be of differences in logs of telomere lengths as outcomes, equivalent to the log-length-ratio, again without the initial length as a predictor.

Thank you @EdM. That approach seems appropriate and simple enough to implement, interpret, and explain to reviewers and co-authors! I agree about the inclusion of other covariates. For those other covariates (e.g. BMI), would you then also include e.g. BMI_difference? Or simply include baseline BMI and call it a day? What about in cases where we are actually interested in whether the change (say in BMI, for example) is associated with the change in TL? Thanks again for your help - I've had a hard time getting clear answers about modeling longitudinal data like this with only 2 points. — Calen, Oct 24 '21 at 21:47
@Calen you have to use your understanding of the subject matter, and the limits imposed by the size of your data set, to choose how best to model covariates--whether as baseline values or changes. Inclusion of covariates might complicate the simple interpretation above of the intercept, however. Also, you should decide whether to include interactions of the covariates with either or both of the `elapsed_time` predictors. Those decisions should be based on what you and your colleagues and your audience think is important to evaluate. The statistical issue is to avoid overfitting your data. — EdM, Oct 24 '21 at 23:30
Yes that all makes sense. The approach you suggest is much easier to get my head around, and I see how the details about the covariates is a separate issue. Thanks again @EdM. — Calen, Oct 25 '21 at 05:04

Modeling temporal change with 2 data points

Study Context:

Modeling:

Stats questions:

1 Answers1