I have a longitudinal data where the categorical response is collected at two-time points. I was wondering if it's possible to adjust my categorical response at baseline as a predictor and run a logistic regression model. The variables I have are Y1=response at time 1; Y2=response at time 2; X1= age; X2=a derived variable. Therefore the model will look like`Y2=a+bY1+cX1+d*X2. Could you please tell me if using this model will be mathematically correct at all? Thanks in advance.
Asked
Active
Viewed 54 times
1 Answers
1
Assuming that you are talking about a 2-level categorical response (e.g., 2-alternative forced-choice, lets' say "No=0/Yes=1") that model is "mathematically correct." The question is whether the model represents what you intend. Your model says that the log-odds of a "Yes" response at Time 2 has a contribution proportional to whether the choice was "Yes" at Time 1 (plus additive contributions proportional to age and to your "derived variable"). If that's what you intend, then go with it.

EdM
- 57,766
- 7
- 66
- 187
-
Thank you for your response. Yes, the model represents what I intend and formulated in a way that makes sense. I was worried if there are any hiccups in terms of losing statistical properties or violating any assumptions. Yes, the response in binary as well. – curiousmind Jul 15 '20 at 04:11
-
1@curiousmind including a baseline value as a predictor of future _values_ is common practice. It's not good as a predictor of _changes_ in values. See [this link](https://stats.stackexchange.com/a/476445/28500) for some brief discussion. If your data are unbalanced there might be some bias in this approach, but bias is often a problem in observational studies. Furthermore, there's also a [bias](https://stats.stackexchange.com/q/113766/28500) if you leave out _any_ predictor associated with outcome from a logistic regression model, so including the baseline value makes sense on balance. – EdM Jul 17 '20 at 16:28