How to simulate a binary response variable based on two non-interacting continuous variables

Question

I want to simulate a binary response variable which depends on two normally distributed continuous variables, and I want to have more 1s than 0s in the response variable. I wonder how this can be done such that a logistic regression will not identify a significant interaction term.

My current approach in R looks like this:

n = 1e5
x1 = rnorm(n)
x2 = rnorm(n)
y = x1+x2+rnorm(n)
y = ifelse(y > 2, 1, 0)
df=data.frame(x1=x1, x2=x2, y=y)
summary(glm(y ~ x1*x2, df, family=binomial(logit)))$coefficients

This usually results in a highly significant interaction term, even though the y is just the sum of x1 and x2. So how can I simulate a y which depends on both x1 and x2, but not on their interaction?

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

1

Ok, I found the answer in another post: How to simulate artificial data for logistic regression?

My initial y should be transformed to a probability with the logistic function and the binary y is then drawn from a Bernoulli distribution.

edited Apr 13 '17 at 12:44

Community

1

answered Sep 07 '14 at 05:30

CookieCrusher

11
2

How to simulate a binary response variable based on two non-interacting continuous variables

1 Answers1