I have a data set that contains a continuous explanatory variable and a set of responses as binary success and failures. For example,
require(stats)
test.data <- data.frame(variable = runif(1000,100,200))
make.data <- function(x){
if(runif(1,0,1) <= ((x + runif(1,-50,50) - 100)/100)){1} else {0}
}
test.data$response <- sapply(test.data$variable, make.data)
head(test.data)
# variable response
#1 171.4345 1
#2 186.9876 0
#3 122.4847 0
#4 189.0977 1
#5 109.0487 0
#6 157.7554 1
It's easy enough to run a glm on this data and get valid results, e.g.
glm.test <- glm(response ~ variable, data = test.data, family = binomial("logit"))
Somehow, the embedded glm logit link function seems to be able to account for entirely zero and entirely one values. If I was to perform the link function manually, e.g.
logit_func <- make.link("logit")$linkfun
test.data$link_response <- sapply(test.data$response, logit_func)
For obvious reasons I get a returned array of +Inf and -Inf.
head(test.data)
# variable response link_response
#1 185.1213 1 Inf
#2 150.7970 1 Inf
#3 178.1121 0 -Inf
#4 127.2224 1 Inf
#5 132.4209 0 -Inf
#6 195.1341 1 Inf
So my questions is, what is the embedded glm link function doing which the standard logit link function not doing? How could I emulate the embedded glm link function?