When is it valid to include interaction terms in a regression model?

Question

I am using logistic regression to analyze some categorical data (binary response variable and categorical -- mostly binary -- predictor variables). For my model, I have something like A ~ B and a hypothesis that the respondent's B has some explanatory power over the choice of A. When I run this regression, only the intercept shows a p-value within the range of statistical significance.

However, I have some other variable C that assesses some pre-existing conditions for each respondent. When I run a logit regression on A ~ B + C, C has a very low p-value (statistically significant). That is to say, the pre-existing preferences that each respondent has, as reflected by C appear to have an effect on their choice in A.

My question then, is whether or not it is appropriate to add an interaction term for B*C to my regression in this case. When I run the logit regression A ~ B * C (or the equivalent to A ~ B + C + B:C), both B and C and the interaction term B:C have high statistical significance (low p-values). Is this statistically valid? Does it make sense for something to become statistically significant when an interaction term is added to the model?

This question is too close to your preceding one to remain opened. I would suggest to keep asking clarifications beneath Maarten's reply on the other thread, unless you really have a very different question. — chl, Apr 24 '13 at 10:11
I understand, thanks. This is a slightly different question in my eyes in that it's asking about the validity of the application of a statistical approach as opposed to the interpretation of a specific model, but you are right that they both come from the same problem that I am having. Maarten happened to answer both questions, so I will clarify with him and anyone else in the comments to the other question.(http://stats.stackexchange.com/questions/57031/interpreting-interaction-terms-in-logit-regression-with-categorical-variables/) — Pygmalion, Apr 24 '13 at 20:51

score 3 · Accepted Answer · answered Apr 24 '13 at 08:54

This kind of pattern can happen when the effect of B on A is positive in one group of C and negative in the other. If you do not include the interaction term between B and C then these two effect cancel out and you'll find an effect close to 0 (or equivalently an odds ratio close to 1). So yes, B could be non-siginificant in a model without the interaction term and become significant when an interaction term is added.

When is it valid to include interaction terms in a regression model?

1 Answers1