Currently cross-posted at https://stackoverflow.com/questions/63492814/interaction-not-significant-but-one-simple-effect-significant-linear-mixed-mod because I wasn't sure which site was more appropriate, but StackOverflow tends to get more traffic and responses. I will take suggestions on where to best post, with the hope of getting useful feedback.
Background: I have fit a linear mixed model using lmer()
(lme4
package) in R with two binary categorical predictors as dummy variables. One (Intervention) is within-subjects, while the other (Sex) is between-. The model accounts for two levels of correlation with random effects (data structure and model code described below). The outcome is proportions, but they're very well-behaved - the mean is around 0.5, with a range of about 0.2 to 0.9, and they're very normally distributed. Subsequently, the residuals show assumptions (normality, equal variance) are met. Thus, I don't think what I'm observing is due to violating assumptions of a linear (mixed) model.
Issue: The following is true no matter what random effects structure I use (which I list below): In every case, the test statistic for the interaction term between the two binary categorical predictors is about 1.7 in magnitude, while that for one of the binary predictors is always about 2.8 (the test stat for the other is ~1.3). Although there is question about how to accurately calculate p-values for these types of models (and whether or not we even should - I'm aware of this discussion point), it is clear that no matter the degrees of freedom used, the interaction term would be not considered statistically significant (with, say, $\alpha$ = 0.05), while the one predictor would. Note here the estimate for the individual predictor is a simple effect, since it is binary and dummy-coded. I used emmeans()
to look at all four possible simple effects, and there is only one that is statistically significant (that with the test statistic of about 2.8).
I cannot figure out how the interaction could lack significance, but one of four possible simple effects is significant. I could see if the test statistics/p-values were "borderline," making it a potential issue of power. However, here the ballpark p-value for the interaction term (test stat ~1.7) is about 0.09, while a rough p-value for the simple effect (test stat ~2.8) is about 0.007. It seems problematic to me that they could differ by a magnitude, and makes me concerned that I am inherently modeling the data incorrectly, although if so, I can't see where I am in error.
Data structure: Each subject has an observed proportion across six different images (out of 12 possible they could have been randomly assigned): Three images were viewed pre-intervention, and three were viewed post-intervention. Thus, there is potential correlation due to subject and image, so these are considered as random effects. Lastly, Intervention is within-subjects, while Sex is between-.
Here is a small dummy dataset (not actual data, where number of unique subjects is 59 (29 of one sex, 30 of the other)):
structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L,
5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L), Image = c("B", "A",
"G", "E", "C", "I", "C", "G", "L", "A", "D", "F", "E", "A", "K",
"B", "C", "I", "D", "F", "H", "J", "L", "B", "D", "F", "A", "L",
"C", "E", "J", "K", "F", "B", "A", "D"), Intervention = c("Pre", "Pre", "Pre", "Post",
"Post", "Post", "Pre", "Pre", "Pre", "Post", "Post", "Post", "Pre",
"Pre", "Pre", "Post", "Post", "Post", "Pre", "Pre", "Pre",
"Post", "Post", "Post", "Pre", "Pre", "Pre", "Post", "Post", "Post",
"Pre", "Pre", "Pre", "Post", "Post", "Post"), Sex = c("Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male"), Prop = c(0.488277,
0.236734, 0.41036, 0.745403, 0.464705, 0.625076, 0.5602122, 0.590909, 0.333266, 0.365954, 0.374941, 0.662141, 0.64877, 0.434947, 0.721343, 0.5288113, 0.782714,
0.603777, 0.4480342, 0.629813, 0.347684, 0.41906, 0.553854, 0.639324, 0.389804, 0.49155, 0.355763, 0.695487, 0.537433, 0.650022, 0.54022, 0.58907, 0.666208,
0.713883, 0.625882, 0.434924)), class = "data.frame", row.names = c(NA, -36L))
Candidate models considered, each with varying random effects:
Model 1 (gave convergence warning): Note the output is that from my actual data (not the dummy dataset given above):
largest_lmer <- lmer(Prop ~ factor(Sex)*factor(Intervention) +
(1 | Image) +
(1 + Intervention | Subject),
data = data01)
coef(summary(largest_lmer))
# Estimate Std. Error t value
# (Intercept) 0.51415277 0.03503742 14.674389
# factor(Sex)Male 0.04019813 0.03006458 1.337059
# factor(Intervention)Pre 0.05123982 0.01830275 2.799569
# factor(Sex)Male:factor(Intervention)Pre -0.04238911 0.02509809 -1.688938
install.packages("emmeans")
library(emmeans)
largest_lmer_emm_Int <- emmeans(largest_lmer, ~ factor(Sex) | factor(Intervention))
pairs(largest_lmer_emm_Int)
# Intervention = Post:
# contrast estimate SE df t.ratio p.value
# Female - Male -0.04020 0.0301 57.3 -1.336 0.1867
#
# Intervention = Pre:
# contrast estimate SE df t.ratio p.value
# Female - Male 0.00219 0.0307 57.2 0.071 0.9434
#
# Degrees-of-freedom method: kenward-roger
largest_lmer_emm_Sex <- emmeans(largest_lmer, ~ factor(Intervention) | factor(Sex))
pairs(largest_lmer_emm_Sex)
# Sex = Female:
# contrast estimate SE df t.ratio p.value
# Post - Pre -0.05124 0.0184 56.5 -2.789 0.0072 **This is the significant simple effect**
#
# Sex = Male:
# contrast estimate SE df t.ratio p.value
# Post - Pre -0.00885 0.0172 55.0 -0.515 0.6084
#
# Degrees-of-freedom method: kenward-roger
Model 2: All output similar to that from Model 1 (not repeated here):
medium_lmer <- lmer(Prop ~ factor(Sex)*factor(Intervention) +
(1 | Image) +
(1 | Subject) +
(1 | Intervention:Subject),
data = data01)
Model 3: All output similar to that from Model 1 (not repeated here):
smallest_lmer <- lmer(Prop ~ factor(Sex)*factor(Intervention) +
(1 | Image) +
(1 | Subject),
data = data01)
As I mentioned, all of these candidate models gave roughly the test statistics noted above - they did not vary depending on the random effects included. Assumptions of the model (normality, equal variance) were met. Is there something else I'm missing? Or is it mathematically possible to have an insignificant interaction, but a significant simple effect that differ as much as these two do with regard to their test statistic/p-value?