1

Suppose we have an observed data matrix $X$ of length $N$ with $2$ column predictors. If I wanted to generate continuous response data from this, we might do

$$ Y^{cont} = X\beta + N(0,1) $$

or in R

Y = beta1*X[,1] + beta2*X[,2]  + rnorm(N, 0, 1)

If instead I wanted to generate binary response data, is it valid to do

$$ Y^{bin} = Bin[\sigma(X\beta + N(0,1))] $$

or in R

Y = rbinom(N, 1, sigmoid(beta1*X[,1] + beta2*X[,2]  + rnorm(N, 0, 1)))

where

sigmoid = function(x) 1/(1+exp(-x))

is the sigmoid function? How can I effectively add noise to data in order to generate binary data?

user321627
  • 2,511
  • 3
  • 13
  • 49
  • `rbinom` already adds noise, it produces (pseudo-) random numbers. See also this recent post:https://stats.stackexchange.com/q/481391/11849 – Roland Aug 07 '20 at 09:19

0 Answers0