How to deal with statistically insignificant values in glm() using R?

Question

Check this code:

myfit <- glm(as.factor(chd) ~ ., data = newshad2, family = binomial(link='logit'))
summary(myfit)

Call:
glm(formula = as.factor(chd) ~ ., family = binomial(link = "logit"), 
    data = newshad2)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.778  -0.821  -0.439   0.889   2.543  

Coefficients:
             Estimate Std. Error z value   Pr(>|z|)    
(Intercept) -7.076091   1.340486   -5.28 0.00000013 ***
sbp          0.006504   0.005730    1.14    0.25637    
tobacco      0.079376   0.026603    2.98    0.00285 ** 
ldl          0.173924   0.059662    2.92    0.00355 ** 
adiposity    0.018587   0.029289    0.63    0.52570    
typea        0.039595   0.012320    3.21    0.00131 ** 
obesity     -0.062910   0.044248   -1.42    0.15509    
alcohol      0.000122   0.004483    0.03    0.97835    
age          0.045225   0.012130    3.73    0.00019 ***
famhist      0.925370   0.227894    4.06 0.00004896 ***  

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 596.11  on 461  degrees of freedom
Residual deviance: 472.14  on 452  degrees of freedom
AIC: 492.1

Number of Fisher Scoring iterations: 5

> myfit.pred <- predict(myfit, type = "response")
> myfit.pred <- predict(myfit, type = "response")
> head(myfit.pred)
   1    2    3    4    5    6 
0.71 0.33 0.28 0.72 0.69 0.62 
> myfit.prob <- ifelse(myfit.pred > 0.5, 1,0)
> table(myfit.prob, newshad2$chd)

myfit.prob   0   1
         0 256  77
         1  46  83
> mean(myfit.prob == newshad2$chd)
[1] 0.73 

> confusionMatrix(myfit.prob, newshad2$chd)
Confusion Matrix and Statistics

          Reference
Prediction   0   1
         0 256  77
         1  46  83

               Accuracy : 0.734         
                 95% CI : (0.691, 0.774)
    No Information Rate : 0.654         
    P-Value [Acc > NIR] : 0.000137      

                  Kappa : 0.384         
 Mcnemar's Test P-Value : 0.006830      

            Sensitivity : 0.848         
            Specificity : 0.519         
         Pos Pred Value : 0.769         
         Neg Pred Value : 0.643         
             Prevalence : 0.654         
         Detection Rate : 0.554         
   Detection Prevalence : 0.721         
      Balanced Accuracy : 0.683         

       'Positive' Class : 0             

 >

I got 73% as good fit. As you can see, I did not do anything with statistically insignificant values. What could be done to such insignificant values so as to get more good results?

Re: `myfit.prob 0.5, 1,0)` see https://stats.stackexchange.com/questions/127042/why-isnt-logistic-regression-called-logistic-classification/127044#127044, https://stats.stackexchange.com/questions/73165/logistic-regression-maximizing-true-positives-false-positives/73364#73364 and answers to https://stats.stackexchange.com/questions/29719/how-to-determine-best-cutoff-point-and-its-confidence-interval-using-roc-curve-i — julian, Sep 11 '17 at 03:35
If you are thinking of feature selection, perform stepwise regression to sort out the important feature. — Buomsoo Kim, Sep 11 '17 at 03:57
No, don't perform stepwise regression: https://stats.stackexchange.com/a/20856/11849 — Roland, Sep 11 '17 at 06:08
Did you try regularization? I think glmnet in R has something like this for lasso and ridge — Jakub Bartczuk, Sep 11 '17 at 09:04
Do you have a reason for including the nonsignificant variables? Is their lack of statistical significance surprising or interesting? If this lack of significance is not interesting and it seems like it is not to you, you can dump them one by one beginning with the variable that is most noisy. Personally, I would model build such that they wouldn't make it into my final model. You seem to care about prediction and they probably are not doing anything to hurt your predictions. — Heteroskedastic Jim, Sep 11 '17 at 22:15

How to deal with statistically insignificant values in glm() using R?

0 Answers0