3

Suppose we have a variable $x$ with 3 levels (e.g. $x_1, x_2$ and $x_3$). We want to see if there is an interaction between $x$ and $y$ where $y$ is a continuous variable. To test this would we include the following terms: $x_{1}y, x_{2}y, x_{3}y$ in a regression model?

Crany
  • 33
  • 2

1 Answers1

8

First, I want to be clear that $y$ is not your response variable, right? I'll call your response variable $z$. Now, you have a continuous covairate, $y$, and a factor, $x$, with three levels. How you want to do this depends, in part, on the coding scheme you use to indicate the $k$ levels of your factor. I will review methods based on reference cell coding (also called 'dummy coding'). In this scheme, you pick a default level of your factor (I'll arbitrarily pick $x_1$). Then you form $k-1$ new, categorical variables to represent the remaining levels of your factor. For each of these new variables, each observation gets a 1, if it is associated with that level, and a 0 otherwise. Now, to form interaction terms, you will create $k-1$ new variables by computing the products of those dummies with your continuous covariate $y$. Thus, in your case the first few rows of the data might look like:

 z     y     x2     x3     x2y     x3y  
6.7   3.4    0      0       0       0  
7.3   2.7    1      0     2.7       0  
5.8   4.4    0      1       0     4.4

And your model would be:

$$ z=\beta_0+\beta_1y+\beta_2x_2+\beta_3x_3+\beta_4x_2y+\beta_5x_3y $$

Within this scheme:

  • $\beta_0$ is the level of $z$ for those observations in the $x_1$ level of your factor when $y=0$
  • $\beta_1$ is the slope of the relationship between $y$ & $z$ for observations in the $x_1$ level of your factor
  • $\beta_2$ is the intercept for level $x_2$
  • $\beta_3$ is the intercept for level $x_3$
  • $\beta_4$ is the slope for observations in the $x_2$ level
  • $\beta_5$ is the slope for observations in the $x_3$ level

To test these effects for 'significance', you first enter all of the factor level dummies into the model together and perform a simultaneous test that all effects are equal to zero, then repeat this procedure by entering all of the interaction terms together.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • So what if the interaction $x_{2}y$ is significant but $x_{3}y$ is not? That wouldn't make sense since parity is one variable. – Crany Feb 07 '12 at 04:17
  • This is just like doing a basic, one-way ANOVA; you first test all levels together with a simultaneous test (ie, F). If that is 'significant', you may want to follow up with a multiple-comparison scheme to infer *which* levels differ. Note that the standard software output for the tests of each level independently does not constitute orthogonal contrasts, as all the other levels are being tested against the same reference cell. – gung - Reinstate Monica Feb 07 '12 at 04:44
  • 6
    To this very helpful answer I would just add that it is very useful to plot your data. If you have the software to do it, put z on the vertical axis and x on the horizontal axis, and draw the three different lines for the three different levels of your variable. If you have an interaction effect they will have different slopes (you need to test for statistical significance of course, but visual examination is always a great start). – Peter Ellis Feb 07 '12 at 05:05
  • @PeterEllis, that's an excellent point that I should have covered. – gung - Reinstate Monica Feb 07 '12 at 05:07
  • sorry, should have said z on the vertical axis and y on the horizontal, with the thee different lines for the three levels of x. – Peter Ellis Feb 07 '12 at 09:59