Should I choose single model A+B+A:B or two models A at 2 levels of B?

Question

I have a longitudinal model: response ~ device * time. The device should lower the response over time. There are two devices: A and B.

There is an interaction between the device and time.

The problem is that the model fit using GLS estimated using REML and unstructured covariance and then passed to a function that calculates the EM-means behaves as follows:

When I fit response ~ device * time, the Dunnett contrast and adjustment specified like: time | device (conditionally) gives me

device: A
t1 - t0 : p=0.0001
t2 - t0 : p=0.0001

device: B
t1 - t0 : p=0.07
t2 - t0 : p=0.054

When I fit response ~ time separately for device A and device B (the data set is filtered to A only or B only), and then use Dunnett on the EM-means separately per each model, I get:

device: A
t1 - t0 : p=0.0001
t2 - t0 : p=0.0001

device: B
**t1 - t0 : p=0.01
t2 - t0 : p=0.01**

The second result shows greater significance and a bit larger effect expressed in EM-means.

And the simple pairwise paired t-test run on certain combinations (corrected via Bonferroni) gives me even lower p-values for device B:

**t1 - t0 : p=0.01
t2 - t0 : p=0.02**

I guess what happened:

pairwise paired t-test ignores the mutual autocovariance between t0, t1 and t2 per measurement. So it reports relatively high significance, even if corrected for multiple comparisons
gls() model fit on 2 separate models takes the covariance into account and doesn't account for the second variable, which may "attenuate" the outcome, but still not much
gls() model on the full interaction device*model attenuates it the most.

Which model is better? I know there are no simple advices, especially that I don't provide reproducible example (I cannot share the data), but maybe some suggestions?

From the client perspective - the pairwise paired t-test is the simplest method and shows nice significance over time per each device separately. Two separate models per each device also give it nice. But the full model, which looks more "appropriate", as it employs both the changes in covariance and the maximum information doesn't show significance on the device B.

I don't hunt for significance, just want to understand why does it happen? And if so, is it always safer to run the full model and sacrifice the significance? Isn't this increasing the type 2 error?

psboonstra · Answer 1 · 2020-07-01T03:22:37.050

Whether or not you can share the original data or fake data that emulates the real data, it would be helpful to at least describe the nature of the data: is your response binary, positive continuous, continuous, etc? Are these data from an experiment or an observational study? How many observations per device*time combination are there? What is special about these three time points? Is time being modeled as a continuous or categorical predictor?

To your first question of which model is better, it depends upon what your plans are with the model. If you're comparing the difference between devices in the change in outcome over time, then you want your option (3): model effects for time, device, and their interaction, and do inference on the interaction parameter. It would not be appropriate to fit two device-specific models and then try to the compare device B model to the device A model. If you truly want to just separately report the change over time for each device, then that brings us to your second question of understanding differences between the models.

Assuming you are fitting a linear-normal model with discrete time, it seems to me that option (3) estimates 10 parameters: 6 mean parameters (one for each device-time combination), 1 common variance parameter, and 3 covariance parameters. Option (2) estimates 14 parameters in total, or 7 for each device-specific model: there are 3 mean parameters (one for each time), 1 common variance parameter, and 3 covariance parameters. In other words, option (3) assumes that the variance is common at each device-time combination, and the correlation between any pair of time points is the same between devices. Option (2) doesn't make either of these assumptions but must therefore estimate more parameters, i.e. spend more degrees of freedom. However, if it's not sensible to assume a common variance, this might be a good thing.

Option (1), paired t-tests, is usually never a good idea, in my opinion.

PS If your outcome is positive-continuous, and you're not log-transforming the outcome prior to fitting your models, you might consider doing that, especially if you're finding that the variance of the outcome is related to the mean of the outcome.

Thank you very much for your comment. No, it's continuous, conditionally normally distrbuted, more or less. If it was skewed, I would not log-transform it. I just read on Research Gate how poor is that approach. It changes the working hypotheses, biases back-transformed confidence intervals, breaks interpretability and isn't the analysis one really needs due to Jenssen's inequality, because it transforms data and errors, not the conditional expectance. Instead, I would run GLM with log link, which addects the E(y) rather than the y - and thus the entire model and errors. — Katikarnata, Jun 30 '20 at 19:39
https://www.researchgate.net/post/Why_do_we_do_transformation_before_data_analysis the last by one comment. dated March 21. I never thought about that when used Box-Cox. I consulted this with my teacher, a PhD in statistics and she confirmed all the points, one by one. I was totally surprised, but well, the notes are very logic, when one thinks about it. — Katikarnata, Jun 30 '20 at 19:51
Ok. You are of course free to choose your own analytic approach. I agree with you that if your data theoretically take on support over the real line, (something I cannot ascertain from the information you provide) then attempting to log-transform is ill-advised and may not even be mathematically possible. But I would not say the decision is as clear-cut as you make it out to be. To answer your original question, one explanation for why you are getting different results in your models is that there are differences in both means and variances between devices. — psboonstra, Jun 30 '20 at 20:49
Yes, these are different. That's why use the GLS with unstructured covariance structure, to allow the (co)variances to vary over time and over measurement. Then I use the emmeans R package to work with model-based estimated means, so the model already accounts for the covariance issues. I could do this also with the GEE. It all so relative. Using the stratified model I get significant results (I can see them on a plot), but cannot easily compare the devices, as you mentioned. The interaction model keeps all together, but the significance goes away.Thank you very much for advices. All the best! — Katikarnata, Jun 30 '20 at 22:24

Should I choose single model A+B+A:B or two models A at 2 levels of B?

1 Answers1