4

I'm hoping you can help me find an analysis best suited for my purpose. I have a dataset structured like this:

dataset

Basically, subjects are measured twice and get a score. If the score remains roughly the same, it is categorized as stable. If the score passes a certain threshold (say 1000) it is categorized as convert or revert. If the score remains above or below the threshold but does change considerably, it is categorized as increase or decrease. This is of course an arbitrary definition and sometimes difficult to interpret because the number of days between timepoints vary (variable day)

Now, what I want to do is compare between the intervention and the control group to see whether the score changes significantly and in which direction. The score data is not normally distributed. I've used fishers exact test for the categorical data, but it seems rather crude. I've also tried to cluster the curves using r package Hmisc or using a linear mixed model, but I'm not really sure whether this is appropriate with only 2 timepoints and I'm not familiar with these methods.

Which analysis do you think is most appropriate? Can you point me in the right direction?

Thanks in advance

1 Answers1

4

Having only 2 time points is not a problem for mixed effects models, provided that you have a sufficent number of clusters (subjects in your case). Also, it is not necessary for the data to be normally distributed, it is the distribution of the residuals that matters.

So for a linear mixed model you could start with this:

model <- lmer(score ~ day*group + (1|id), data = mydata)

where day*group are the fixed effects and signifies that you are fitting main effects for both variables plus the interaction between them.

The main effect for day would be interpreted as the association between score and day (ie the slope) when group is at it's reference level. The main effect for group will be the estimated difference in score between the intervention and control group when day is zero. The interaction will be interpreted as the difference in slopes between the control group and the intervention group. Together these should allow you answer your research question:

what I want to do is compare between the intervention and the control group to see whether the score changes significantly and in which direction

The random intercept (1|id) controls for the non-independence between subjects.

Depending on how large your dataset is, and whether the data would support a more complex model, you could also allow the fixed effects to vary by subject by fitting random slopes. for example if you wanted only the effect of day to vary by id with group and the interaction between them fixed, you would fit:

model <- lmer(score ~ day*group + (day|id), data = mydata)

You should think carefully about whether that makes in your particular field.

Robert Long
  • 53,316
  • 10
  • 84
  • 148
  • Thanks for your response, I've fitted the first model and it works. I do have some reading up to do on the interpretation of the results, so if you know where I can find a clear explanation, I would be much obliged. As to your second model, I've tried to use it but it throws an error (number of observations <= number of random effects for term). I think this is because I have only 2 timepoints? I think you can turn it off with lmerControl, but don't know if that's desirable. I have about 300 treatment subjects and 100 control subjects – Edifier8888 Jun 30 '20 at 08:08
  • 1
    @Edifier8888 [here](https://stats.stackexchange.com/questions/149621/how-should-i-implement-this-interaction-between-a-continuous-and-categorical-pre) is a good explanation of interactions, but you just search the internet for " interpretatipon of categorical by continuous interaction" and there are lots of resources - UCLA has some good ones. As for the more complex model, it seems that you data don't support that, which is fine. – Robert Long Jun 30 '20 at 09:23