4

I am struggling to understand how/if the interaction is connected to mediation. I understand that the interaction in a regression indicates that a variable Z influences the effect of a variable X on the outcome (Y). I am also aware that the variable X can influence Y through another variable, a mediator, in this case, we call it M. This mediation can be either complete or partial, meaning that X can influence Y directly and through M.

However, what I do not quite understand is how these two aspects are related. I guess I am wrong in assuming that if X interacts with Z, I can decompose the influence of Z on the effect of X on Y using mediation analysis.

What I would like to do is to identify all the variables (Y and Z) that have a combined effect on Y. I then want to decompose the effect of X and Z on Y in:

  1. interaction (X*Z->Y)
  2. direct effect (X->Y)
  3. mediated effect (X->Z(M)->Y)

Are these simply different problems that require independent regression and mediation analyses, or there is a way to explore the above simultaneously? Is there an R package that does that? Thank you very much!

efrem
  • 332
  • 2
  • 12

1 Answers1

5

Interaction and mediation are different things.

In mediation, we have a causal pathway where one variable causes the mediator and the mediator causes the outcome.

In interaction, we have a joint action, where two variables are associated with an outcome, but the "effect" of one variable depends on the value of the other variable.

Clearly these are different things. If we were to do a simple simulation, we might proceed as follows, in R.

First we simulate for an interaction:

set.seed(1)
X <- rnorm(500)
Z <- rnorm(500)
Y <- X + Z + X*Z + rnorm(500) 
lm(Y ~ X * Z)

And we find:

## Coefficients:
## (Intercept)            X            Z          X:Z  
##   -0.006785     0.967882     0.927355     0.973669 

as expected. In particular, we see that the interaction has an estimate close to 1.

Now, for mediation:

set.seed(1)
X <- rnorm(500)
M <- X + rnom(500)
Y <- X + M + rnorm(500)

Now some care is needed. If we fit the model lm(Y ~ X + M) we obtain:

## Coefficients:
## (Intercept)            X            M  
##   -0.005709     1.043180     0.925210 

So, here the estimate for X, 1.04 is the direct effect of X on Y, and 0.92 is the indirect (mediated) effect. Typically in inference we would like to total effect, which should obviously be close to 2, and we can obtain that with:

lm(Y ~ X)
## Coefficients:
## (Intercept)            X  
##    -0.04731      1.92853  

as expected.

Robert Long
  • 53,316
  • 10
  • 84
  • 148
  • Thank you very much. I think what I do not quite get is whether these concepts are mutually exclusive. In other words, I can have a variable X that influences Y, influences (partially) M and then Y, and also affects Y by its interaction with M. Am I seeing this completely wrong or there is a way to tease this apart? In mediation, do we assume that M is completely caused by X? Thank you – efrem Aug 09 '21 at 18:43
  • You're welcome. Yes, we can tease it apart as I showed in the simulations (I showed the direct, indirect and total effects, plus an example of interaction). No we don't assume complete mediation - in that case we would expect a near-zero direct effect instead of 1.04 in the simulation above, and no, we don't have to assume that M is completely caused by X. And yes, these concepts are mutially exclusive. Interactions are also known as *moderation*, and we can have mediated moderation, and moderated mediation (they are not the same). – Robert Long Aug 09 '21 at 19:13
  • 1
    I would recommend reading [this answer](https://stats.stackexchange.com/questions/445578/how-do-dags-help-to-reduce-bias-in-causal-inference/445606#445606) which you may find useful. – Robert Long Aug 09 '21 at 19:14
  • Great, the answer linked is a great explanation. Thank you! So basically if I conduct an explorative analysis and I do not know whether Z interacts with X or it is its mediator I would need to run independent regressions to understand that. In both cases, if I do not include the Z variables (either as interaction or as mediator) in an initial model, I will be in both cases able to capture a smaller fraction of Y variance. So adding them either as Y:X or as M will help me to "model" Y better. Do I understand this correctly? – efrem Aug 10 '21 at 11:08
  • Yes, although I would strongly recommend that you choose variables based on some theoretically driven considerations - that is, you ought to know whether mediation and/or moderation is likely. – Robert Long Aug 10 '21 at 11:10