My understanding is that logistic regression assumes a linear relationship between the logit of the outcome and each predictor variable.
I'm working on a case study from this MIT course. My model is making really poor predictions and I suspect it is because non-linearity.
idx <- sample(seq(1, 3), size = nrow(Book), replace = TRUE, prob = c(.45, .35, .2))
train <- Book[idx == 1,]
val <- Book[idx == 2,]
test <- Book[idx == 3,]
glm.fit1 <- glm(Florence ~., family = binomial, data = train)
summary(glm.fit1)
glm.probs1 <- predict(glm.fit1, test, type='response')
glm.pred1 <- rep("0",nrow(test))
glm.pred1[glm.probs1 >.5] <- "1"
This is the confusion matrix
> table(glm.pred1,test$Florence)
glm.pred1 0 1
0 787 73
1 0 1
How can I confirm that assumption?
What I've tried: I plotted the predictions against the log-transformed probabilities that came out of my model. I was told that doesn't work for poorly performing classifiers. Here is a post with info.