I am trying to run a simulation in logistic regression but got trapped. Why I am only get ~71% accuracy even using ground truth of coefficients for prediction?
set.seed(0)
n <- 1e5
p <- 5
X <- matrix(rnorm(n*p), ncol=p)
beta <- runif(p)
y <- rbinom(n,1,prob = plogis(X %*% beta))
Note we can get the estimation of beta
by using glm
. The estimation is pretty close when data size is large.
> glm(y~X-1,family="binomial")$coefficients
X1 X2 X3 X4 X5
0.68415400 0.59206451 0.29157944 0.84165069 0.08466564
> beta
[1] 0.68309592 0.60590097 0.30353578 0.83300563 0.07931528
But, here suppose we are using the ground truth beta
.
Here is prediction using ground truth and the confusion matrix
table(y,plogis(X %*% beta)>0.5)
y FALSE TRUE
0 35499 14425
1 14456 35620