Is it statistically valid to enter predictors that are not significantly related to DV in a linear mixed effects model?

Question

Is it statistically valid to enter "control" predictors that are theoretically relevant in a linear mixed effects model (aka multilevel model), even if only very weak and non-significant linear relationships were demonstrated beforehand?

The predictors in question could be considered control variables (e.g., age, SES, score 1, score 2). The main aim of the analysis is to examine the relative effects of these between-subjects predictors and a within-subjects predictor (experimental condition) on a continuous DV. Linear mixed effects modelling is being used, as this approach can take into account the non-independent observations.

Linearity was explored through through examining scatterplots, and conducting bivariate Pearson correlations (they yielded small Pearson coefficients that were non-significant). Unsurprising, since only weak and nonsignificant linear relationships were demonstrated in screening, the control variables do not contribute to the variance in the DV when entered into the linear mixed effect model. But dropping them doesn't seem valid given their theoretical relevance. It seems more useful to maintain them in the model as controls to soak up some "noise", and then discuss how that in this particular sample they did not actually contribute to variance in the DV. In addition, I think it makes sense to keep them in the model and check for possible interactions with the main exploratory predictor (condition).

Would this approach be statistically valid?

Wouldn't "practically relevant" require that they be statistically associated? (Perhaps not in a linear way.) At any rate, please say more. Have you looked for nonlinear relationships? Also, how can we judge this "should" question, i.e., what are you trying to accomplish with this model--predict the DV for individual cases? Compare the importance of different IV? See how well the DV can be predicted, as with R-squared? Etc. — rolando2, Jan 07 '17 at 19:45
See also ["univariate/univariable screening"](http://stats.stackexchange.com/search?q=univariate+screening). — Scortchi - Reinstate Monica, Jan 12 '17 at 12:14

score 1 · Answer 1 · answered Jan 08 '17 at 20:36

Yes, they should be included. First, the fact that, pair-wise, a predictor appears to have a non-linear relationship with the dependent variable does not necessarily imply that, in the presence of all other included predictors, the remaining association will still be non-linear.

Second, non-linear relationships can be handled by including also non-linear functions of the predictor alongside its level, - e.g. both "Age" and "Age-squared".

score 0 · Answer 2 · answered Jan 08 '17 at 03:46

0

The assumption of linearity is not violated unless there is a nonlinear relationship. There is little point to including variables unrelated to the criterion but it shouldn't matter unless you have a pretty small sample size.

answered Jan 08 '17 at 03:46

David Lane

1,194
1
8
9

I have clarified the question based on your answer. The predictors and DV had very weak linear relationships. I think they should be included on this basis, even though not strong or significant. – ambalashes Jan 08 '17 at 18:41
I ageee. Only if your sample size were so small that the loss of df for including these variables made a meaningful difference would I leave them out. – David Lane Jan 08 '17 at 18:55

Is it statistically valid to enter predictors that are not significantly related to DV in a linear mixed effects model?

2 Answers2