6

Consider a linear regression (based on least squares) on two predictors including an interaction term: $$Y=(b_0+b_1X_1)+(b_2+b_3X_1)X_2$$

$b_2$ here corresponds to the conditional effect of $X_2$ when $X_1=0$. A common mistake is to understand $b_2$ as being the main effect of $X_2$, i.e. the average effect of $X_2$ over all possible values of $X_1$.

Now let's assume that $X_1$ was centered, that is $\overline{X_1}=0$. It becomes now true that $b_2$ is the average effect of $X_2$ over all possible values of $X_1$, in the sense that $\overline{b_2+b_3X_1}=b_2$. In such conditions, the meaning given to $b_2$ is nearly indistinguishable from the meaning that we would give to the effect of $X_2$ in a simple regression (where $X_2$ would be the only variable, let's call this effect $B_2$).

In practice, it seems that $b_2$ and $B_2$ are reasonably close to each other.

Question:

Are there any "common knowledge" examples of situations where $B_2$ and $b_2$ are remarkably far from each other?

Are there any known upper bounds to $|b_2-B_2|$?


Edit (came after @Robert Long's answer):

For the record, a very rough calculation of what the difference $|b_2-B_2|$ might look like.

$B_2$ can be computed via the usual covariance formula, giving $$B_2=b_2+b_3\dfrac{Cov(X_1X_2,X_2)}{Var(X_2)}$$ The last fraction is roughly distributed like the ratio of two normal variables, $\mathcal N(\mu,\frac{3+2\mu^2}{\sqrt N})$ and $\mathcal N(0,\frac{2}{\sqrt N})$ (not independent, unfortunately), assuming that $X_1\sim \mathcal N(0,1)$ and $X_2\sim \mathcal N(\mu,1)$. I've asked a separate question to try to circumvent my limited calculation skills.

Arnaud Mortier
  • 604
  • 3
  • 13
  • (+1) Interesting question Intuitively I think your statement that `b_2` and `B2` are close isn't correct but I am looking into it :) – Robert Long Jul 21 '20 at 05:50
  • @RobertLong Oh I've just noticed your edit. I'd always thought that "factor" was a synonym for "explanatory variable" - just like "predictor". – Arnaud Mortier Jul 21 '20 at 13:33
  • 1
    I guess it can be used that way but by far the most common use of "factor" is as another word for a "categorical variable." – Robert Long Jul 21 '20 at 14:09

2 Answers2

4

$b_2$ here corresponds to the conditional effect of $X_2$ when $X_1=0$. A common mistake is to understand $b_2$ as being the main effect of $X_2$, i.e. the average effect of $X_2$ over all possible values of $X_1$.

Indeed. I typically answer at least one question per week where this mistake is made. It it also worth pointing out for completeness that $b_1$ here corresponds to the conditional effect of $X_1$ when $X_2= 0 $ and not the main effect of $X_1$ which is easily seen by rearranging the formula

$$Y=(b_0+b_2X_2)+(b_1+b_3X_2)X_1$$

In practice, it seems that $b_2$ and $B_2$ are reasonably close to each other.

I think this is false in general for this model and will will only be true when the interaction term $b_3$ is very small.

Are there any "common knowledge" examples of situations where $B_2$ and $b_2$ are remarkably far from each other?

Yes, when the $b_3$ is meaningfully large then $B_2$ and $b_2$ will be meaningfully apart. I am thinking of how to show this algebraiclly and graphically but I don't have much time now, so I will resort to a simple simulation for now. First with no interaction:

> set.seed(25)
> N <- 100
> 
> dt <- data.frame(X1 = rnorm(N, 0, 1), X2 = rnorm(N, 5, 1))
> 
> X <- model.matrix(~ X1 + X2 + X1:X2, dt)
> 
> betas <- c(10, -2, 2, 0)
> 
> dt$Y <- X %*% betas + rnorm(N, 0, 1)
> 
> (m1 <- lm(Y ~ X1*X2, data = dt))$coefficients[3]
  X2 
2.06 
> (m2 <- lm(Y ~ X2, data = dt))$coefficients[2]
  X2 
1.96

as expected. And now with an interaction:

> set.seed(25)
> N <- 100
> 
> dt <- data.frame(X1 = rnorm(N, 0, 1), X2 = rnorm(N, 5, 1))
> 
> X <- model.matrix(~ X1 + X2 + X1:X2, dt)
> 
> betas <- c(10, -2, 2, 10)
> 
> dt$Y <- X %*% betas + rnorm(N, 0, 1)
> 
> (m1 <- lm(Y ~ X1*X2, data = dt))$coefficients[3]
  X2 
2.06 
> (m2 <- lm(Y ~ X2, data = dt))$coefficients[2]
  X2 
3.29 

Are there any known upper bounds to $|b_2-B_2|$

I don't think so. As you increase $|b_3|$ then $|b_2-B_2|$ should increase

Robert Long
  • 53,316
  • 10
  • 84
  • 148
  • I see: when $b_2$ is small with respect to the interaction effect, meaning the average slope is close to $0$ compared to the extremal slopes, then $B_2$ is likely to be (statistically insignificant and) very sensitive to error. It makes perfect sense, thank you very much. As for the upper bound I would still be happy with something that depends on $b_3$. But in light of your answer, this second question matters less. I think it would also be interesting to have some kind of estimates of how sensitive $B_2$ becomes in such situations. – Arnaud Mortier Jul 21 '20 at 08:23
  • I agree. I need to expand this answer. The standard error for $B_2$ will become large as the interaction gets larger. In the continuous x continous case here the confidence interval gets very wide (actually containing $b_2$ in the simulations I looked at). I will look at this some more later today as it's very interesting, and also look at categorical $X1$ and $X2$ – Robert Long Jul 21 '20 at 08:30
  • Regarding the possible values for $|_2−_2|$, in fact using the equation we can get $$_2=_2+_3\dfrac{(_1_2,_2)}{(_2)}$$ which does support the fact that the absolute difference grows with $|_3|$. I believe that the law of the sample covariance of $_1_2$ and $_2$ can be derived from the CLT, which settles this part of the question. – Arnaud Mortier Jul 21 '20 at 11:53
  • Ahh that's very useful. I will incorporate it into my simulations later. – Robert Long Jul 21 '20 at 11:59
  • In an orthogonal factorial design (i.e., a balanced design with equal sample sizes in each combination of X1 and X2), B2 will equal b2. In this case, the covariances are 0. Thus, using @ArnaudMortier's equation, b3 drops out. In an unbalanced design, centering the variables reduces but does not eliminate, the covariances, thus resulting in a B2 that is generally closer to b2. – dbwilson Jul 21 '20 at 12:08
  • My comment above assumes that the coding of the effects was done in an orthogonal way, such as with effect coding. – dbwilson Jul 21 '20 at 14:40
  • @dbwilson these are numerical, not categorical, variables. The covariance between $X_1$ and $X_2$ in general will not be zero though it is close to zero in my simulation above. I'm going to update it a bit later when I have time. – Robert Long Jul 21 '20 at 15:27
  • @RobertLong Not saying anything is wrong with your answer but I now have figured out a more direct reason why it is wrong to believe that $b_2$ should be close to $B_2$ in general, I added a new answer below. I will add a simulation of such data when I have time. Thanks again! – Arnaud Mortier May 06 '21 at 14:05
1

Adding to @RobertLong's answer, there is a slight conceptual mistake in the way $b_2$ is described in the question in the case where $X_1$ was centered. It is indeed true that $b_2$ becomes the average effect of $X_2$ over all possible values of $X_1$, in the sense that $\overline{b_2+b_3X_1}=b_2$, but it should be emphasized that this is an average of simple effects. It may have nothing to do with the main effect of $X_2$ on the DV, which means that $b_2$ may be really far from $B_2$ even without interaction.

Here is an example where there is no interaction, and $b_2$ and $B_2$ have nothing in common: the vertical axis is the DV $Y$, the horizontal axis is for the covariate $X_2$, and the colors stand for levels of the covariate $X_1$. For any value of $X_1$, the simple effect $b_2+b_3X_1$ is around $-1$, while the main effect $B_2$ is clearly positive.

enter image description here

Arnaud Mortier
  • 604
  • 3
  • 13