4

I would like to fit a mixed effect model to the following dataset, but I am having difficulties figuring out the best way to define the random effects.

For each subjects (ratID, N=10), I measure the same variable cc_marg using six different instruments (the six levels of the factor mPair) for each cycle of observations (cycles $c=1...N_c$, $N_c>>N$, $N_c$ different across subjects). For each cycle, the six measurements are taken simultaneously (therefore these measurements are correlated by the cycle at which they are taken). For each subject, I repeat this experiment three times (in random order across subjects), one for each level of the controlled variable spd_des (factor with three levels). I am interested in studying the effect of spd_des and mPair (and their possible interaction) on the variable cc_marg. I am not interested in the effect of cycle on the output variable.

There are two sources of randomness: ratID and cycles. However, I am confused on how to nest the latter into the former. There are several cycles for each subject, which would make me think that I should simply need to do ~1|ratID/cycle. However, the cycles obtained at a given level of spd_des (for each subject) are unrelated to those obtained at another level (even though they have the same identifiers $c=1...N_c$). Should I then nest cycle within spd_des within ratID, using ~1|ratID/spd_des/cycle? If I do so, however, I am also defining a random effect of spd_des, which I wasn't planning on doing actually. How do you think I should define the random effects in this design? (this is my main question).

If I do not nest cycle, I obtain unreasonably high numbers of denominator DF when I run anova, increasing the probability of false positive results. Here are the results if I do not nest:

> linM3 <- lme(cc_marg ~ mPair*spd_des , random = ~1|ratID, data=dat_trf, na.action=na.omit, method = "ML", control=lCtr )
> anova.lme(linM3,type="marginal")

              numDF denDF   F-value p-value
(Intercept)       1 14540  128.5679  <.0001
mPair             5 14540 2405.9828  <.0001
spd_des           2 14540    5.4406  0.0043
mPair:spd_des    10 14540   42.7502  <.0001

If I nest cycle within ratID, I obtain:

> linM3n <- lme(cc_marg ~ mPair*spd_des , random = ~1|ratID/cycle, data=dat_trf, na.action=na.omit, method = "ML", control=lCtr )
> anova.lme(linM3n,type="marginal")

              numDF denDF   F-value p-value
(Intercept)       1 12843  128.7659  <.0001
mPair             5 12843 2563.1850  <.0001
spd_des           2 12843    5.0572  0.0064
mPair:spd_des    10 12843   43.9206  <.0001

If I nest cycle within spd_des within ratID, I obtain:

> linM3n4 <- lme(cc_marg ~ mPair*spd_des , random = ~1|ratID/spd_des/cycle, data=dat_trf, na.action=na.omit, method = "ML", control=lCtr )
> anova.lme(linM3n4,type="marginal")

              numDF denDF   F-value p-value
(Intercept)       1 11503  120.7824  <.0001
mPair             5 11503 2803.9750  <.0001
spd_des           2    15    0.8420  0.4502
mPair:spd_des    10 11503   35.2944  <.0001

The results between the first and the second model above are not very different, but the third model provides different results in terms of spd_des. It is therefore important choosing the right model. How should I define the random effects considering the experimental design and the research question? Thanks!

[UPDATE]

I have tried to create a variable 'exp_spd' that keeps track of the experimental session. As stated, there is one experimental session for each level of spd_des, but the order in which I performed the experiments is randomized across subjects. The model is as follows:

linM1n <- lme(cc_marg ~ mPair*spd_des , random = ~1|ratID/exp_spd/cycle, data=dat_trf, na.action=na.omit, method = "ML", control=lCtr )
anova.lme(linM1n,type="marginal")
              numDF denDF   F-value p-value
(Intercept)       1 11528  122.3557  <.0001
mPair             5 11528 2802.2565  <.0001
spd_des           2 11528    0.3990   0.671
mPair:spd_des    10 11528   35.1272  <.0001

In terms of significance of the fixed-effect, the results are equivalent to the model linM3n4 above, where I nested spd_des instead of exp_spd. However, the denDF are different. In particular, the denDF of spd_des changes drastically. Shouldn't it be the same, given that each level of exp_spd is associated to only one level of spd_des (and vice-versa)? This issue is very obscure to me, and any help is highly appreciated.

Cristiano
  • 507
  • 2
  • 11

1 Answers1

3

It seems to me that the random grouping factors in your study are:

  • rat;
  • experiment;
  • cycle.

So the "random" portion of your model would be listed as random = ~1|ratID/experimentID/cycleID.

The predictor variable spd_des is an experiment-level predictor which can appear in the fixed effects portion of your model.

Usually, people advise that you should have at least 5 levels for a random grouping variable in your model, so that you can meaningfully estimate the standard deviation (or variance) of the random effects associated with that random grouping variable. However, you only have 3 such levels for the experiment variable. If this poses an issue for the estimation of the standard deviation (or variance) of the random intercepts associated with the experiment variable, you will need to respecify your model.

Isabella Ghement
  • 18,164
  • 2
  • 22
  • 46
  • 1
    Thanks! I do not really have an `experimentID` variable. I repeat the measurements three times to assess the three levels of `spd_des`. So basically, for each "experiment" (group of cycles), the value of `spd_des` will be different. I could create an experimentID variable, but wouldn't that be equivalen of doing `random = ~1|ratID/spd_des/cycle`? – Cristiano May 17 '19 at 15:22
  • 1
    No, it wouldn't be equivalent, because the experimentID would be taking values like "1", "2", "3", and appear in the random portion of your model to keep track of which experiment you are dealing with, whereas the spd_des could take the same values, though these would have a totally different meaning, and would be listed in the fixed effects portion of your model (it's a predictor whose value you measure for each experiment). There is a big difference between a random grouping factor and a predictor variable you are measuring for each level of that random grouping factor! – Isabella Ghement May 17 '19 at 15:27
  • Ok thanks! Are you suggesting that the values of experimentID should increase sequencially irrespective of the value of spd_des? So for example experimentID could have values 1,2,3,4, 5,6 (with repetitions of these values across subjects) for values of spd_des A,B, C, A, B, C. Or are you suggesting that I should keep the an equivalence between experimentID and spd_des? For example, experimentID=1,2,3,1,2,3 for values of spd_des=A, B, C, A, B, C. – Cristiano May 17 '19 at 15:35
  • 1
    I would suggest to number the experiments for each rat in the exact order in which you performed them - so that 1 corresponds to the first experiment for that rat, 2 corresponds to the second experiment, etc. It was my understanding that you had 3 experiments per rat? But that each experiment included multiple cycles? That's what your initial post seemed to suggest. – Isabella Ghement May 17 '19 at 15:51
  • 1
    ok, sorry I misunderstood. Yes, you are right: there are three experiments for subject (one for each level of `spd_des`), and within each experiment there are multiple cycles. In truth, there are 6 experiments per rat; three of them are to assess the effect of `spd_de`s, and the other three to assess three levels of another variable `inc_des`. I was going to study `spd_des` and `inc_des` in two different models because I am not evaluating all the possible combinations and I am not interested in their interactions. The experiments are randomized across the level of both variables though. – Cristiano May 17 '19 at 16:31
  • That said, do you suggest to define the variable 'experimentID' to keep track of the order of all six experiments, even if I evaluate 'spd_des' and 'inc_des' in two different models? This would mean that when I fit the model for, say, 'spd_des' I could have for one subject, experimentID=2,5,3, and for another subject experimentID=1,6,2. Is this a problem? – Cristiano May 17 '19 at 16:35
  • I guess you just need a convenient way to label each experiment performed for a rat. If the experiments used to evaluate inc_des were performed by setting the value of sps_des to some specific value, it doesn't seem right to me to discard them from your model. I think they should be included in the model, which would then allow you to include fixed main effects for inc_des and sps_des in your model. – Isabella Ghement May 17 '19 at 17:28
  • 1
    The names you use for the experiments are just labels but these labels should have a clear meaning - whether you call an experiment 4, D or IV for a rat, you should be able to describe what differentiates that experiment from the other experiments you conducted for that rat. The label is something that helps you distinguish between experiments, while also characterizing the nature of each experiment. – Isabella Ghement May 17 '19 at 17:30
  • ok In understand! Regarding the model. For the experiments where `inc_des` varies, `spd_des` is kept at one of the three levels. For the experiments where `spd_des` varies, `inc_des` is kept at one of the three levels. I did not cover all the possible combinations of `spd_des` and `inc_des`; that's why I was going to use different models. But if that's wrong, then I could include both of them in a single model. However, it would not make sense to include an interaction term between the two, as I do not have data to evaluate it. Shall I define fixed effect as `spd_des*mPair+inc_des*mPair`? – Cristiano May 17 '19 at 18:08
  • 1
    Great! Just as I thought, the experiments where you investigated inc_des contain information about spd_des and the other way around. This suggests that all experiments should indeed be included in the same model. You are right that you can't investigate the interaction between inc_des and spd_des in your model, since you haven't considered all possible combinations of levels for these two factors in your experiments. If it makes sense to consider the interaction between each of these two factors and mPair, then what you defined in terms of fixed effects makes sense. – Isabella Ghement May 17 '19 at 18:14