3

I have repeated measures data from n_subjects where each has n_obs number of measurements before and after some intervention, and there are two experimental groups. I am interested in the influence of group on the change from pre to post so the model looks something like

y ~ time + group:time

and I am interested in the group:time interaction.

time is a numeric column and either 0 or 1 depending on whether it was from the pre or the post timepoint, respectively. group is a factor.

One of the models, that I otherwise like the performance of, has coverage (95% CI) which gets lower for a larger number of observations at each timepoint, but does not change depending on the number of subjects.

enter image description here

(horizontal line is at y=0.95, where the coverage ought to be. There are 150 simulations for each combination of n_subjects and n_obs. Error bars represent Monte Carlo standard error.)

I am struggling to interpret this and understand what can be changed about the model to improve the coverage. What information is the model missing out on that's leading it to be overly precise?

The model is not biased in either direction, and the ModSE/EmpSE is >1 but trending towards 1 for both larger n_subjects and n_obs (as I believe it ought to).

All metrics were calculated by the methods described in Tim Morris' paper here.

More possibly useful details about the model: The data are log-normally distributed. I am using the gamlss pkg in R and the parameter I'm interested in is the group:time interaction in the sigma model to investigate the effect of group on the change in within-subject variability. The model code is:

model <- gamlss::gamlss(
    formula=y ~ time + group:time + re(random=~1|subject),
    sigma.fo=~ time + group:time + re(random=~time|subject),
    family='LOGNO2',
    data=data
  )
  • Please include a link to the data and also the simulation code – Robert Long Aug 22 '21 at 07:50
  • How is the confidence interval for the `group:time` term in the sigma formula calculated? – COOLSerdash Aug 22 '21 at 08:12
  • @RobertLong Sorry for the delay, I've put the code on a repository here. https://github.com/RWParsons/simulation_study The main function which performs a single simulation is `fx_do_one_sim` in data_grid_and_simulator.R – Rex Parsons Aug 22 '21 at 08:15
  • 1
    @COOLSerdash I boot strap samples from the multivariate normal distribution 10000 times and take the 0.025 and 0.975 quantiles having put these back on the interpretable scale. See lines 227-256 of https://github.com/RWParsons/simulation_study/blob/main/src/models.R – Rex Parsons Aug 22 '21 at 08:17
  • 1
    Couple of thoughts: i) Are you sure you're comparing the confidence intervals to the correct value? You generate the data using `rlnorm` but `LOGNO2` has a different parametrization. $\mu$ for example in `LOGNO2` denotes the median, not the mean on the log-scale. ii) How is the coverage if you use `confint(..., parm = "", what = "sigma")`? – COOLSerdash Aug 22 '21 at 09:15
  • Thanks for the feedback @COOLSerdash. For your first point, that's definitely something that I'll need to consider when later interpreting the mean model, but I'm not sure if that'd influence my current problem as I'm only interpreting the sigma model at the moment. For ii), I just have done a quick test and I can get the confidence intervals for the `time:group2` interaction but since there is a log-link for the sigma model, this isn't interpretable as absolute change in the sd, which is what I'm assessing coverage over. That's why I went down the path of the bootstrapping approach. – Rex Parsons Aug 22 '21 at 09:48

0 Answers0