So I'm playing around with logistic regression in R, using the mtcars dataset, and I decide to create a logistic regression model on the 'am' parameter (that is manual or automatic transmission for those of you familiar with the mtcars-dataset).
Call:
glm(formula = am ~ mpg + qsec + wt, family = binomial, data = mtcars)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.484e-05 -2.100e-08 -2.100e-08 2.100e-08 5.163e-05
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 924.89 883764.07 0.001 0.999
mpg 20.65 18004.32 0.001 0.999
qsec -55.75 32172.52 -0.002 0.999
wt -111.33 103183.48 -0.001 0.999
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4.3230e+01 on 31 degrees of freedom
Residual deviance: 6.2903e-09 on 28 degrees of freedom
AIC: 8
Number of Fisher Scoring iterations: 25
Now, at first sight this looks like a terrible regression, right? The standard errors are HUGE, the z-values are all close to zero and the corresponding probabilities are all close to one. HOWEVER, the residual deviance is extremely small!
I decide to check how well the model does as a classification model by running:
pred <- predict(logit_fit, data.frame(qsec = mtcars$qsec, wt = mtcars$wt, mpg = mtcars$mpg), type = "response") # Make a prediction of the probabilities on our data
mtcars$pred_r <- round(pred, 0) # Round probabilities to closest 0 or 1
table(mtcars$am, mtcars$pred_r) # Check if results of classification is any good.
Indeed, the model perfectly predicts the data:
0 1
0 19 0
1 0 13
Have I completely misunderstood how to interpret model data? Am I overfitting massively or what's going on here? What's going on?