I want to simulate a binary response variable which depends on two normally distributed continuous variables, and I want to have more 1s than 0s in the response variable. I wonder how this can be done such that a logistic regression will not identify a significant interaction term.
My current approach in R looks like this:
n = 1e5
x1 = rnorm(n)
x2 = rnorm(n)
y = x1+x2+rnorm(n)
y = ifelse(y > 2, 1, 0)
df=data.frame(x1=x1, x2=x2, y=y)
summary(glm(y ~ x1*x2, df, family=binomial(logit)))$coefficients
This usually results in a highly significant interaction term, even though the y is just the sum of x1 and x2. So how can I simulate a y which depends on both x1 and x2, but not on their interaction?