Simulate data including systematic error!

Question

for my bachelor thesis, I am doing a simulation study in order to compare analysis methods.

The generated data is a pre-post design data, with two groups measured at two times.

#parameters
b0 <- 0  
b1 <- .2

#treatment
X <- matrix(0,ncol=1,nrow=n)  
X[1:(n/2)] <- 0  
X[((n/2)+1):n] <- 1  

#mean structure
mean <- matrix(0,ncol=2,nrow=n)  
mean[,1] <- b0 + X  
mean[,2] <- b0 + b1*X

When generating data, I would like to include a systematic bias due to missing randomization. Say, males are scoring higher on a particular variable and are more present in the treatment group. Also, the gender effect should be different across pre- and post tests. How would you include this bias in the generation of the data?

I hope this is the information you need, else I will give you any information that is needed.

Maybe it is helpful to give the information that i am using R. — Sven Kleine Bardenhorst, Nov 02 '16 at 14:38
Sample an unbiased sample and downsample the "biased" group? — Tim, Nov 02 '16 at 14:39
You may also want to say something about the structure of the data that you'd like to simulate — Ian_Fin, Nov 02 '16 at 14:40
I would like to simulate a pre-post data structure, there are two groups, measured at two times. In the treatment group an effect is simulated that should be found in the analysis. The idea is to create a systematic bias and to see if the analysis can estimate this bias and control for it. — Sven Kleine Bardenhorst, Nov 02 '16 at 14:47
You should add that information to your question, not bury it in the comments. Nonetheless, there still isn't enough information here for this to be answerable. — gung - Reinstate Monica, Nov 02 '16 at 21:37
I`m sorry, i will try to give elaborate information by editing the original post! — Sven Kleine Bardenhorst, Nov 03 '16 at 09:31
When you create the treatment effect, why not also create a gender effect, and have the proportion of males be >.5? — Ian_Fin, Nov 03 '16 at 09:59
So when I create a variable Z with >.5 ratio of males, how can I model the higher ratio of males in the treatment group? — Sven Kleine Bardenhorst, Nov 03 '16 at 13:41

score 0 · Accepted Answer · answered Nov 04 '16 at 11:15

I'm not entirely sure I understand your question, but here is my attempt to answer. I'll simulate a treatment and a control group, where there is an over-representation of males in the treatment group and females in the control group. Males have a higher pre-test score, and the post-test score is modeled as a random variable with half of the mean of the individual pre-test score as the expected value. Treatment does not have any effect on the post-test score. The individual difference between pre-test and post-test is the outcome measure in the statistical test that follows. This means that males will have a larger difference than females, and if gender is not taken into account in the analysis, the treatment will appear to be associated with a higher difference in the test scores.

First, I create groups with different proportions of males and females:

set.seed(1)
group.size <- 150
trt <- c(rep(0, group.size), rep(1,group.size))
gender <- c(rbinom(group.size,1,0.4), rbinom(group.size,1,0.6))
prop.table(table(trt, gender), margin=1)
gender
trt         0         1
  0 0.5933333 0.4066667
  1 0.3333333 0.6666667

Now, pre-test and post-test scores are simulated. Pre-test score depends on gender (mean 10 for females and 14 for males). Post-test scores are based on the individual pre-test score for each individual:

pre.test <- rnorm(group.size*2, 10+gender*4,2)
post.test <- rnorm(group.size*2, pre.test/2, 1)

tapply(pre.test, gender, mean)
       0         1 
9.981824 14.089030 

tapply(post.test, gender, mean)
       0        1 
4.900903 6.994718

The individual difference between pre-test and post-test is calculated, and a linear regression model is then run, with difference as the dependent variable and treatment as the independent variable:

diff <- pre.test - post.test
summary(lm(diff ~ trt))
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5.8741     0.1447  40.591  < 2e-16 ***
trt           0.5748     0.2047   2.808  0.00531 **

The results are clearly significant. The treated group has around 10% higher difference. However, when taking gender into account:

summary(lm(diff ~ trt + gender))
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.06113    0.14159  35.744   <2e-16 ***
trt          0.05502    0.17807   0.309    0.758    
gender       1.99901    0.17855  11.196   <2e-16 ***

As you can see, the effect of treatment virtually disappears and is far from statistical significance.

I hope this was an answer to your question.

Sorry for the late answer and thanks for your answer! I think this code pretty much fits to my problem, thanks! — Sven Kleine Bardenhorst, Nov 05 '16 at 18:44

Simulate data including systematic error!

1 Answers1