1

Let's say I have 3 factors f1, f2 and f3 and fit these models

m1 <- lmer(y ~ f1 + (1|sub))
m2 <- lmer(y ~ f1*f2 + (1|sub))
m3 <- lmer(y ~ f1*f2*f3 + (1|sub))

Does it make sense to compare the models with anova(m1,m2,m3)? Or would you first test every single step

m1 <- lmer(y ~ f1 + (1|sub))
m2 <- lmer(y ~ f1 + f2 + (1|sub))
m3 <- lmer(y ~ f1 + f2 + f3 + (1|sub))
m4 <- lmer(y ~ f1*f2 + f3 + (1|sub))
etc

I suppose the second example is correct. But what if m1 and m2 do not differ statistically? Does that mean I should not include f2 and f3 in my model? But what if for example f2 and f3 are significant in model m3? This happens with my real data, for example m1 is not significantly different from m2, but f1 interacts significantly with f2 when the interaction is added to the model. I just don't see the point in comparing models then.

locus
  • 743
  • 4
  • 17
  • compare for what purpose? if it's forecasting performance then you compare anything with anything – Aksakal Nov 15 '18 at 01:10

2 Answers2

2

It would be better to perform the omnibus test for all interactions together, i.e.,

fm_additive <- lmer(y ~ f1 + f2 + f3 + (1 | sub))
fm_inter <- lmer(y ~ f1 * f2 * f3 + (1 | sub))
anova(fm_additive, fm_inter)

and from it see if any of the interactions seem to offer anything when including them in the model.

Dimitris Rizopoulos
  • 17,519
  • 2
  • 16
  • 37
1

The second approach you describe is called stepwise regression and is not a good idea. You should base the inclusion of variables first and foremost on theoretical basis. If you believe f1, f2 and f3 are only meaningful if their interaction with one another is included, then the first method for comparing them would be better.

Do note that a model with more parameters always produces a better fit. However, a better fit is not a better model, as it may overfit the sample, poorly predicting new observations.

It might benefit you to read e.g. this question (and its the many warnings about stepwise regression expressed in the comments and highest rated answer) or the answer here.

Frans Rodenburg
  • 10,376
  • 2
  • 25
  • 58
  • OP isn't necessarily trying to select a model, they may simply be trying to test hypotheses about the effects of different variables in the model ... – Ben Bolker Nov 15 '18 at 01:57
  • Thanks @Frans Rodenburg but the first method seems strange to me because `m1 – locus Nov 28 '18 at 00:19