I would like to generate a data set of variables from a specific causal structural (a stylized world) for simulation, similar to this answer, but most of the key variables are binary/indicators.
Specifically, my example wants to capture the relationship between example variables surgery, clinical variables, and patient race from the following structural model:
I will then use the created data to demonstrate confounding applied to this setting. However, each variable, except 'appropriate', is an indicator variable i.e. 1 if the true and 0 if false, which makes setting up the systems of equations tricky to make into simple linear combinations.
How should I create these equations?
To show an attempt in R, I tried to create the structural equations for this stylized world as such:
Black <- rbinom(reps, 1, .5)
U <- rnorm(reps)
Appropriate <- (1/(1+exp(-U)))
ST_Elev <- as.numeric(2*Black + 3*Appropriate + rnorm(reps) > 2)
LowSES <- as.numeric(3*Black+rnorm(reps)>2.5)
Surgery <- as.numeric(5*Appropriate+LowSES+2*ST_Elev+rnorm(reps) > 3.5)
world <- data.frame(Black ,
ST_Elev,
LowSES,
Appropriate,
Surgery)
by just creating indicators with the right proportions based on being greater than some value. But this formation doesn't nicely follow with my (limited) understanding of structural causal models and I can't figure out how to control the relationships as well directly.