Simulating data to fit a mediation model

Question

I am interested in finding a procedure to simulate data that are consistent with a specified mediation model. According to the general linear structural equation model framework for testing mediation models first outlined by Barron and Kenny (1986) and described elsewhere such as Judd, Yzerbyt, & Muller (2013), mediation models for outcome $Y$, mediator $\newcommand{\med}{\rm med} \med$, and predictor $X$ and are governed by the following three regression equations: \begin{align} Y &= b_{11} + b_{12}X + e_1 \tag{1} \\ \med &= b_{21} + b_{22}X + e_2 \tag{2} \\ Y &= b_{31} + b_{32}X + b_{32} \med + e_3 \tag{3} \end{align} The indirect effect or mediation effect of $X$ on $Y$ through $\med$ can either be defined as $b_{22}b_{32}$ or, equivalently, as $b_{12}-b_{32}$. Under the old framework of testing for mediation, mediation was established by testing $b_{12}$ in equation 1, $b_{22}$ in equation 2, and $b_{32}$ in equation 3.

So far, I have attempted to simulate values of $\med$ and $Y$ that are consistent with values of the various regression coefficients using rnorm in R, such as the code below:

x   <- rep(c(-.5, .5), 50)
med <- 4 + .7 * x + rnorm(100, sd = 1) 

# Check the relationship between x and med
mod <- lm(med ~ x)
summary(mod)

y <- 2.5 + 0 * x + .4 * med + rnorm(100, sd = 1)

# Check the relationships between x, med, and y
mod <- lm(y ~ x + med)
summary(mod)

# Check the relationship between x and y -- not present
mod <- lm(y ~ x)
summary(mod)

However, it seems that sequentially generating $\med$ and $Y$ using equations 2 and 3 is not enough, since I am left with no relationship between $X$ and $Y$ in regression equation 1 (which models a simple bivariate relationship between $X$ and $Y$) using this approach. This is important because one definition of the indirect (i.e., mediation) effect is $b_{12}-b_{32}$, as I describe above.

Can anyone help me find a procedure in R to generate variables $X$, $\med$, and $Y$ that satisfy constraints that I set using equations 1, 2, and 3?

gung - Reinstate Monica · Answer 1 · 2019-04-02T13:20:32.657

This is quite straightforward. The reason you have no relationship between $x$ and $y$ using your approach is because of the code:

y <- 2.5 + 0 * x + .4 * med + rnorm(100, sd = 1)

If you want some relationship between $x$ and $y$ even when ${\rm med}$ is included (that is, you want partial mediation), you would simply use a non-zero value for $b_{32}$ instead. For example, you could substitute the following code for the above:

y <- 2.5 + 3 * x + .4 * med + rnorm(100, sd = 1)

Thus, $b_{32}$ has been changed from $0$ to $3$. (Of course some other, specific value would probably be more relevant, depending on your situation, I just picked $3$ off the top of my head.)

Edit:
With respect to the marginal $x\rightarrow y$ relationship being non-significant, that is just a function of statistical power. Since the causal force of $x$ is passed entirely through ${\rm med}$ in your original setup, you have lower power than you might otherwise. Nonetheless, the effect is still real in some sense. When I ran your original code (after having set the seed using 90 as a value that I again just picked off the top of my head), I did get a significant effect:

set.seed(90)
x <- rep(c(-.5, .5), 50)
med <- 4 + .7 * x + rnorm(100, sd = 1) 

# Check the relationship between x and med
mod <- lm(med ~ x)
summary(mod)

y <- 2.5 + 0 * x + .4 * med + rnorm(100, sd = 1)

# Check the relationships between x, med, and y
mod <- lm(y ~ x + med)
summary(mod)

# Check the relationship between x and y -- not present
mod <- lm(y ~ x)
summary(mod)

...
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.8491     0.1151  33.431   <2e-16 ***
x             0.5315     0.2303   2.308   0.0231 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

...

To get more power, you can increase the $N$ you are using, or use smaller error values (i.e., use sd= values less than the default 1 in the rnorm() calls).

gung, thanks for your answer. I suppose my question might be a little ambiguous. What I want is not a relationship between x and y in model 3 (which is what you've done), but in model 1 (Y = b11 + b12 * X + e1). I have clarified my question to this effect. — Patrick S. Forscher, Oct 10 '13 at 17:18
Thanks for the edit. Is it possible to directly specify the size of the population effect for coefficient b12? — Patrick S. Forscher, Oct 10 '13 at 17:33
Your question at this point is what would be: what is the population correlation between $x$ & $y$ in general. I wonder if that might be best asked as a new question, as I'm not sure off the top of my head. In the simplest case, where all 3 variables ($x$, $med$, $y$) are normally distributed, & the relationship b/t $x$ & $y$ is *fully* mediated, then $\rho_{x,y} = \rho_{x,med}*\rho_{med,y}$. However, it's more complex if the distributions aren't normal (eg, your $x$ is equal frequencies of $-.5$ & $+.5$), or w/ more complex mediational situations. — gung - Reinstate Monica, Oct 10 '13 at 17:47

POC · Answer 2 · 2020-03-23T17:36:16.397

Here is a paper on how to model simple mediation in Caron & Valois (2018): There R code is

  x <- rnorm(n)
  em <- sqrt(1-a^2)
  m <- a*x + em*rnorm(n)
  ey2 <- sqrt(ey)
  y <- cp*x + b*m + ey2*rnorm(n)
  data <- as.data.frame(cbind(x, m, y))

You just have to specify $n$ (the sample size), $a$, $b$ and $c'$ (direct effect). The advantage here is that you will model standardized coefficients so you'll know their effect sizes. They also included code to unstandardize, carry the Baron & Kenny, Sobel and Bca bootstrap.

References

Caron, P.-O., & Valois, P. (2018). A computational description of simple mediation analysis. The Quantitative Methods for Psychology, 14, 147-158. doi:10.20982/tqmp.14.2.p147

Simulating data to fit a mediation model

2 Answers2

Linked

Related