0

I have one independent continuous and time-dependent variable X, repeatedly measured (from 1 to 4 times) in different patients during some period of time. My dependent variable Y is binary and is obtained after a period of time (e.g diagnosis at the end of the 3 months during which X was measured)

My first step would be to properly model the time-dependency of X with a GLMM and random effects to account for repeated measures, but this would fail due to the insufficient number of data per patient (25% have only 1 measure).

So I am somewhere in between :
1.) a simple logistic regression with only one datapoint per patient (or an aggregated version of X) but this is a shame in term of power loss) and
2.) a proper modeling of X with random effect accounting for repeating measures.

I have read this post which seems to cover a related topic : Which statistical model to use when trying to find the beginning of a time-dependent increase

Is it sound to the use of a logistic regression with all data considered first as independent, even if most of them are not, but then adjust the errors with a function like robcov() from rms package ? Can we obtain "adjusted" LR coefficients and a LR model to make new prediction with this approach ?

Thanks in advance.

Esculape
  • 21
  • 1
  • I do not see any issues with your option 2 or with doing the analysis using robust standard errors. I doubt whether in a practical situation there will be much difference. You will also need a variable for time of course - actual time if they are not four fixed time points. I would not do option 1 unless forced to. – mdewey May 06 '19 at 12:50
  • Thanks for your reply. The problem with option 2 is that I end up having more random effects than the number of observations if I include both a random intercept and a random slope, so the variance is unidentifiable (`lmer` throws an error). Yes I have a proper variable for time. – Esculape May 06 '19 at 13:06
  • 1
    Why do you need to include random slopes ? – Robert Long May 06 '19 at 13:31
  • Also, you could try a model in which the random intercepts and random slopes are uncorrelated, i.e., with a diagonal matrix for the random effects. – Dimitris Rizopoulos May 06 '19 at 13:45
  • If 25% have only one observation and presumably many have only two I would suggest a random slope is not necessary as @RobertLong hinted. – mdewey May 06 '19 at 13:57
  • Think of it this way: when you specify random slopes you are asking the software to estimate a slope for each patient. For those patients with only 1 observation, what would their slopes be ? This is one reason why the random effects are not identified. – Robert Long May 06 '19 at 14:14
  • Thanks for your suggestions. I understood that the software is indeed unable to build a slope for unique datapoints. I am not yet familiar with the need of a random slope in a context with few data points per individual. From visual plots on my data, it seems that the two outcomes have different temporal trends in the variable ***X***, one with strong increase and the other with slower increase. That is why I thought, a fixed slope would be inappropriate. – Esculape May 06 '19 at 15:20
  • It would be a good idea to include the plots in your question. – Robert Long May 12 '19 at 14:09
  • @RobertLong indeed it is more challenging to estimate slopes from individuals with a single observation but theoretically and practically in the software you do get slopes estimates for them. – Dimitris Rizopoulos May 18 '19 at 19:02

0 Answers0