I have a longitudinal model: response ~ device * time. The device should lower the response over time. There are two devices: A and B.
There is an interaction between the device and time.
The problem is that the model fit using GLS estimated using REML and unstructured covariance and then passed to a function that calculates the EM-means behaves as follows:
When I fit response ~ device * time, the Dunnett contrast and adjustment specified like: time | device (conditionally) gives me
device: A
t1 - t0 : p=0.0001
t2 - t0 : p=0.0001
device: B
t1 - t0 : p=0.07
t2 - t0 : p=0.054
When I fit response ~ time separately for device A and device B (the data set is filtered to A only or B only), and then use Dunnett on the EM-means separately per each model, I get:
device: A
t1 - t0 : p=0.0001
t2 - t0 : p=0.0001
device: B
**t1 - t0 : p=0.01
t2 - t0 : p=0.01**
The second result shows greater significance and a bit larger effect expressed in EM-means.
And the simple pairwise paired t-test run on certain combinations (corrected via Bonferroni) gives me even lower p-values for device B:
**t1 - t0 : p=0.01
t2 - t0 : p=0.02**
I guess what happened:
pairwise paired t-test ignores the mutual autocovariance between t0, t1 and t2 per measurement. So it reports relatively high significance, even if corrected for multiple comparisons
gls() model fit on 2 separate models takes the covariance into account and doesn't account for the second variable, which may "attenuate" the outcome, but still not much
gls() model on the full interaction device*model attenuates it the most.
Which model is better? I know there are no simple advices, especially that I don't provide reproducible example (I cannot share the data), but maybe some suggestions?
From the client perspective - the pairwise paired t-test is the simplest method and shows nice significance over time per each device separately. Two separate models per each device also give it nice. But the full model, which looks more "appropriate", as it employs both the changes in covariance and the maximum information doesn't show significance on the device B.
I don't hunt for significance, just want to understand why does it happen? And if so, is it always safer to run the full model and sacrifice the significance? Isn't this increasing the type 2 error?