Logistic regression with repeated mesures and unique outcome

Question

I have one independent continuous and time-dependent variable X, repeatedly measured (from 1 to 4 times) in different patients during some period of time. My dependent variable Y is binary and is obtained after a period of time (e.g diagnosis at the end of the 3 months during which X was measured)

My first step would be to properly model the time-dependency of X with a GLMM and random effects to account for repeated measures, but this would fail due to the insufficient number of data per patient (25% have only 1 measure).

So I am somewhere in between :
1.) a simple logistic regression with only one datapoint per patient (or an aggregated version of X) but this is a shame in term of power loss) and
2.) a proper modeling of X with random effect accounting for repeating measures.

I have read this post which seems to cover a related topic : Which statistical model to use when trying to find the beginning of a time-dependent increase

Is it sound to the use of a logistic regression with all data considered first as independent, even if most of them are not, but then adjust the errors with a function like robcov() from rms package ? Can we obtain "adjusted" LR coefficients and a LR model to make new prediction with this approach ?

Thanks in advance.

I do not see any issues with your option 2 or with doing the analysis using robust standard errors. I doubt whether in a practical situation there will be much difference. You will also need a variable for time of course - actual time if they are not four fixed time points. I would not do option 1 unless forced to. — mdewey, May 06 '19 at 12:50
Thanks for your reply. The problem with option 2 is that I end up having more random effects than the number of observations if I include both a random intercept and a random slope, so the variance is unidentifiable (`lmer` throws an error). Yes I have a proper variable for time. — Esculape, May 06 '19 at 13:06
Also, you could try a model in which the random intercepts and random slopes are uncorrelated, i.e., with a diagonal matrix for the random effects. — Dimitris Rizopoulos, May 06 '19 at 13:45
If 25% have only one observation and presumably many have only two I would suggest a random slope is not necessary as @RobertLong hinted. — mdewey, May 06 '19 at 13:57
Think of it this way: when you specify random slopes you are asking the software to estimate a slope for each patient. For those patients with only 1 observation, what would their slopes be ? This is one reason why the random effects are not identified. — Robert Long, May 06 '19 at 14:14
Thanks for your suggestions. I understood that the software is indeed unable to build a slope for unique datapoints. I am not yet familiar with the need of a random slope in a context with few data points per individual. From visual plots on my data, it seems that the two outcomes have different temporal trends in the variable ***X***, one with strong increase and the other with slower increase. That is why I thought, a fixed slope would be inappropriate. — Esculape, May 06 '19 at 15:20
It would be a good idea to include the plots in your question. — Robert Long, May 12 '19 at 14:09
@RobertLong indeed it is more challenging to estimate slopes from individuals with a single observation but theoretically and practically in the software you do get slopes estimates for them. — Dimitris Rizopoulos, May 18 '19 at 19:02

Logistic regression with repeated mesures and unique outcome

0 Answers0