sorry if this is a total newbie question,but can I run the following regression formula: Y= X1 + X2 + X3 + X1* X2 +X1* X3 without adding upper level interaction variables, such as X1 * X2 *X3?
Thanks
sorry if this is a total newbie question,but can I run the following regression formula: Y= X1 + X2 + X3 + X1* X2 +X1* X3 without adding upper level interaction variables, such as X1 * X2 *X3?
Thanks
Yes, that's perfectly acceptable. It's usually best to base you model on your understanding of the subject matter. If your understanding of the subject matter indicates that only the X1:X2
and X1:X3
interaction are likely to be important, that would generally be OK. Certainly there is no rule in statistics against doing that. There can be a problem when you include interactions and omit individual terms for the predictors, however, as discussed here.
As you are just starting to learn about this, recognize that there is a tradeoff that involves the art of statistical modeling.
It can be best to start with as complex a model as possible that won't overfit your data. See Section 4.1 of Frank Harrell's course notes or book. That could involve several levels of interactions, flexible modeling of continuous predictors, etc. If you have a very large data set, that can be a more productive approach, particularly if your interest is in prediction.
With a more complex model and a limited data set, however, you run risks of overfitting and finding spurious "significant" effects or, as you have to estimate more coefficients from your data with a complex model, losing power to find truly significant effects. With that in mind, only you and your colleagues can weight the benefits against the risks of a more complex model in any particular circumstance. Harrell's course notes and book provide useful guidance on this.