Check this code:
myfit <- glm(as.factor(chd) ~ ., data = newshad2, family = binomial(link='logit'))
summary(myfit)
Call:
glm(formula = as.factor(chd) ~ ., family = binomial(link = "logit"),
data = newshad2)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.778 -0.821 -0.439 0.889 2.543
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.076091 1.340486 -5.28 0.00000013 ***
sbp 0.006504 0.005730 1.14 0.25637
tobacco 0.079376 0.026603 2.98 0.00285 **
ldl 0.173924 0.059662 2.92 0.00355 **
adiposity 0.018587 0.029289 0.63 0.52570
typea 0.039595 0.012320 3.21 0.00131 **
obesity -0.062910 0.044248 -1.42 0.15509
alcohol 0.000122 0.004483 0.03 0.97835
age 0.045225 0.012130 3.73 0.00019 ***
famhist 0.925370 0.227894 4.06 0.00004896 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 596.11 on 461 degrees of freedom
Residual deviance: 472.14 on 452 degrees of freedom
AIC: 492.1
Number of Fisher Scoring iterations: 5
> myfit.pred <- predict(myfit, type = "response")
> myfit.pred <- predict(myfit, type = "response")
> head(myfit.pred)
1 2 3 4 5 6
0.71 0.33 0.28 0.72 0.69 0.62
> myfit.prob <- ifelse(myfit.pred > 0.5, 1,0)
> table(myfit.prob, newshad2$chd)
myfit.prob 0 1
0 256 77
1 46 83
> mean(myfit.prob == newshad2$chd)
[1] 0.73
> confusionMatrix(myfit.prob, newshad2$chd)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 256 77
1 46 83
Accuracy : 0.734
95% CI : (0.691, 0.774)
No Information Rate : 0.654
P-Value [Acc > NIR] : 0.000137
Kappa : 0.384
Mcnemar's Test P-Value : 0.006830
Sensitivity : 0.848
Specificity : 0.519
Pos Pred Value : 0.769
Neg Pred Value : 0.643
Prevalence : 0.654
Detection Rate : 0.554
Detection Prevalence : 0.721
Balanced Accuracy : 0.683
'Positive' Class : 0
>
I got 73% as good fit. As you can see, I did not do anything with statistically insignificant values. What could be done to such insignificant values so as to get more good results?