Model simplification & the dangers of it?

Question

I am currently running some linear models and I'm trying to simply them a bit to make analysis a bit easier. I have been using lrtest() and AIC() to compare them to find the best model. When I removed an interaction in my model (e.g. model2) it made one of the factors significant (y) which previously wasn't when the interaction was in my model (e.g. model1).

model1 <- lm(x~y*z)
model2 <- lm(x~y+z)

When I did a summary of model1 there was an interaction present but I didn't notice it and reduced my model further to model 2 but when I compare model1 and model 2 the AIC is not a difference of 2 and the lrtest() function says they're not significantly different (p=0.1237).

I just I was wondering is it better to not overly simplify my model in case it makes other factors significant when I have visualised my data and there doesn't really seem to be a significance for this factor? Should I should just stick with my original model (aka you can't get 'false' results by sticking with the original and not overly simplifying it?

It's not meaningful to assess the significance of a main effect if there's an interaction in the model. — mkt, Jul 31 '19 at 14:56
As for which model to prefer, that isn't clear from the output you've presented. One convention is to prefer the simpler model unless there is clear evidence (such as by AIC) that the more complex model is an improvement. But that's not necessarily a strategy that's always best. — mkt, Jul 31 '19 at 14:58
What is the meaning of the effects and the interaction? Which model would make most sense? What is the meaning of the data in relation to your believes about the different models? ------------ If you are arguing to choose a model just because it makes your results significant then your motivation for the study seem to be wrong. What is your goal? Study some initial question(s), or instead just gather some data and show that there is something significant (but meaningless) in it. — Sextus Empiricus, Aug 02 '19 at 10:37
Regarding the reasons why 'y' may be significant in the one model but not the other see: https://stats.stackexchange.com/questions/20452/ The significance test compares the model with the effect 'y' with a model without the effect 'y' (null hypothesis, or absence of effect). When you have a lot of other (possibly irrelevant) factors added to the model then this test becomes less *powerful*. Note that a lack of significance does not mean that the effect is not correct or not present. It just means that your data is not showing it and this may be because of a noisy or small data set. — Sextus Empiricus, Aug 02 '19 at 10:47

score 1 · Answer 1 · answered Aug 02 '19 at 09:06

The interpretation of the significance test of the regression coefficients changes when you include an interaction term. In contrast to some oversimplified recommendations "you cannot interpret a main effect when there is an interaction", you can actually do it, you just need to know the interpretation of the regression coefficients:

In your y+z model, it is the change in the predicted x-score when you increase y (or z, respectively) by 1 unit while fixing the other predictor at a fixed value.

In the presence of an interaction, the regression coefficients of y and z are the change in the predicted x-score when you increase y (or z, respectively) by 1 unit while fixing the other predictor at zero.

You can look up this interpretation enter link description here, for instance.

From what you report, it seems to me that your data cannot really distinguish between the interaction and the main effects only model. What that means, depends on the sample size. If you have a very large sample, the interaction effect is probably very small - potentially irrelevant (but that depends on you substantive theory!). If you have a small sample (e.g. N < 300), it may well be that a substantively relevant interaction is concealed by too low power.

Many people would argue that when in doubt the more parsimonious model is preferable (Ockham's razor) - but you really need to consider this from a substantive point.

Another important consideration is that the significance tests are not as meaningful as you might think after model selection (this gets worse the more models you tested). You can look this up, e.g., here.

To have a more specific opinion, I'd need to know more about what you actually do but I hope this input helps to further clarify what you should consider from here.

Model simplification & the dangers of it?

1 Answers1