Creating observed/expected ratio using logistic regression

Question

I am using logistic regression to benchmark the performance of some students in different years. I created a scenario as below:

mydata         <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
benchmark.data <- mydata[1:300,] # students form year 1990-1995 as benchmark
compare.data   <- mydata[301:400,] # students from year 1996

# logistic regression model created using benchmark student result
temp.glm <- glm(admit~gre+gpa+rank, data=benchmark.data, family="binomial")

# using the regression model to predict how students in 1996 perform
compare.data[,"predict"] <- predict(temp.glm, newdata=compare.data, type="response")

# making a threshold such that if the predicted chance of admit > 0.5, 
#  then it is assumed that the student will get admitted
compare.data[,"predict_admit"] <- ifelse(compare.data[,"predict"]>0.5, 1, 0)
table(compare.data[,c("admit", "predict_admit")])

#      predict_admit
# admit  0  1
#     0 59  6
#     1 26  9

From the table, it is seen that 15 students were predicted to get admitted and actual number of students get admitted is 35, so the observed/expected ratio is 35/15 = 2.33, as it is larger than 1, so I will say that students in year 1996 are performing better than the benchmark.

Can I draw my conclusion using the method mentioned above?
Also, how should I set the threshold? Or should I sum(compare.data[,"predict"]) and treat it as expected value?

Update 1

I tried and used ROC curve to determine the threshold:

library(ROCR)
benchmark.data[,"predict"] <- predict(temp.glm, newdata=benchmark.data, type="response")
preds <- prediction(benchmark.data[,"predict"], as.numeric(benchmark.data[,"admit"]))
plot(performance(preds,"tpr","fpr"), print.cutoffs.at=seq(0,1,by=0.05))

And the charts suggests that threshold at 0.35 seems to maximize sensitivity and specificity.

Unfortunately, your link has rotted. Your "benchmark" and "comparison" data are probably more commonly called "training" and "testing" samples. Regarding thresholds and sensitivity & specificity, you may be interested in [Why is accuracy not the best measure for assessing classification models?](https://stats.stackexchange.com/q/312780/1352) — Stephan Kolassa, Nov 27 '17 at 08:23

Creating observed/expected ratio using logistic regression

Update 1

0 Answers0