I am looking to create a function that simulates data arising from a mediation process, where a predictor (X) has an indirect effect on the outcome (Y) through the mediator (M).
I consulted the answers to the following questions:
I would like the function to simulate:
the mediator and outcome if the user inputs the predictor,
the predictor and outcome if the user inputs the mediator, or
the predictor and mediator if the user inputs the outcome
I would like the user to be able to specify various conditions for simulating the data arising from mediation, including the correlation between X
and Y
, the correlation between X
and M
, the correlation between M
and Y
, and the proportion of the effect mediated. The proportion of the effect mediated (Pm) is the ratio of the indirect effect (ab
) to the total effect (Wen & Fan, 2015). I would like the function to simulate the data that would yield a mediation model with the conditions specified by the user.
For instance, I would like the function to estimate:
the total effect if the user inputs the correlation between
X
andM
, the correlation betweenM
andY
, andproportionMediated
(Pm)proportionMediated
if the user inputs the correlation betweenX
andM
, the correlation betweenM
andY
, and the correlation betweenX
andY
the correlation between
X
andM
and the correlation betweenM
andY
(assuming they are equal) if the user inputs the correlation betweenX
andY
andproportionMediated
the correlation between
X
andM
if the user inputs the correlation betweenM
andY
, the correlation betweenX
andY
, andproportionMediated
the correlation between
M
andY
if the user inputs the correlation betweenX
andM
, the correlation betweenX
andY
, andproportionMediated
I used the answer to the first link (above) in writing the beginnings of a function:
simulateIndirectEffect <- function(x, m, y, a, b, cTotal, proportionMediated, seed){
if(missing(seed)){
seed <- round(runif(1, 0, 1000)*100)
}
if(missing(cTotal) == TRUE){
cTotal <- (a * b) / proportionMediated
} else if(missing(proportionMediated) == TRUE){
proportionMediated <- (a * b) / cTotal
} else if(missing(a) == TRUE & missing(b) == TRUE){
a <- sqrt(proportionMediated * cTotal)
b <- sqrt(proportionMediated * cTotal)
} else if(missing(a) == TRUE){
a <- (proportionMediated * cTotal) / b
} else if(missing(b) == TRUE){
b <- (proportionMediated * cTotal) / a
}
ab <- a * b
cPrime <- cTotal - ab
if(missing(x) == FALSE){
sampleSize <- length(x)
set.seed(seed + 1)
m <- a*x + sqrt(1-a^2) * rnorm(sampleSize) #what should I change error term to?
error <- 1 - (cPrime^2 + b^2 + 2*a*cPrime*b)
set.seed(seed + 2)
y <- cPrime*x + b*m + error*rnorm(sampleSize) #what should I change error term to?
} else if(missing(m) == FALSE){
sampleSize <- length(m)
set.seed(seed + 1)
#x <- #Not sure what to put here
set.seed(seed + 2)
#y <- #Not sure what to put here
} else if(missing(y) == FALSE){
sampleSize <- length(y)
set.seed(seed + 1)
#x <- #Not sure what to put here
set.seed(seed + 2)
#m <- #Not sure what to put here
}
simulatedData <- as.data.frame(cbind(x, m, y))
return(simulatedData)
}
I have three questions:
- How can we simulate
m
andy
givenx
(and the conditions specified) in the above function? - How can we simulate
x
andy
givenm
(and the conditions specified) in the above function? - How can we simulate
x
andm
giveny
(and the conditions specified) in the above function?
Note that the function above does not appear to simulate the mediation data per the conditions specified. For instance, when I simulate data based on a total effect of .6 and a proportion of the effect mediated of .4, my correlations are way too high. I want my correlation between x and y to be .6 (i.e., the total effect), but it is .99 in the simulated data (see below). I suspect that using rnorm()
to generate a random variable with a mean of 0 and SD of 1 is too small to add to the error term, but am not sure what to use instead.
> predictor <- rnorm(1000, mean = 50, sd = 10)
> myData <- simulateIndirectEffect(x = predictor, cTotal = .6, proportionMediated = .4, seed = 12345)
> round(cor(myData), 2)
x m y
x 1.00 0.98 0.99
m 0.98 1.00 0.99
y 0.99 0.99 1.00
References:
Wen, Z., & Fan, X. (2015). Monotonicity of effect sizes: Questioning kappa-squared as mediation effect size measure. Psychological Methods, 20, 193-203. doi: 10.1037/met0000029