0

I want to simulate a data set for logistic regression in which my $Y_i \sim Bin(n_i, p_i)$ and $n_i >1 ~ \forall i$. I want something like:

enter image description here

In another question, data has been generated for a logistic in which $n_i = 1$. I am confused as to whether it would be correct to follow this method and then bin the $x$ variables and call that a population. I'm not quite sure how to do this without creating some sort of bias in the data that I won't account for in the logistic regression. I'm looking for an explicit description of how to account for $n_i>1$, if possible using R.

EDIT: Using the code in the question which I've tweaked, here is what I have:

set.seed(1)
x1 <- rnorm(6)           # some continuous variables 
n <- round(runif(6, min = 1, max = 20))
z = 1 + 2*x1                
pr = 1/(1+exp(-z))         # pass through an inv-logit function
y <- matrix(0,6,1)
for( i in 1:6 ) { y[i] <- sum(rbinom(n[i], 1, pr[i]) == 1)}

Y <- y/n

Are there any reasons this is not a reasonable way of doing things?

WeakLearner
  • 1,013
  • 1
  • 12
  • 23

1 Answers1

1

Your approach is correct. What is important is t use the same p for the whole group. However you could make the code easier:

set.seed(1)
x1 <- rnorm(6)           # some continuous variables 
n <- sample(x=c(1:20),size=6,prob=rep(1/20,20),replace=TRUE)
z = 1 + 2*x1                
pr = 1/(1+exp(-z))         # pass through an inv-logit function
y <- matrix(0,6,1)
for( i in 1:6 ) { y[i] <- rbinom(1,n[i], pr[i]) }

Y <- y/n
Otto_K
  • 171
  • 1