I am very new to R and statistics in general and have been stuck on this for a couple of weeks so any input would be greatly appreciated.
I have a binary outcome variable TOTHLOS
and 14 categorical predictor variables. I am looking to run a multivariate logistic regression analysis to see which predictors are independently significant for my outcome variable. All predictor variables were screened with a chisq.test
and binomial logistic regression and all were significant at that level. All predictors are categorical variables and have been set to factors.
As you can see, many of my predictor p-values become non-significant when all 14 are used. When predictor variables ASA
, OPTIME
, PRHCT
and PRSODM
are removed, I get a result a bit closer to what I was intuitively expecting.
I followed with a multicollinearity test, and nothing stood out to me. I'd prefer to use all of my available predictors if possible, but I'm not sure how to proceed with the results I'm getting.
Any recommendations for how to overcome this issue would be greatly appreciated.
> a<-glm(TOTHLOS ~ age + SEX + relevel(RACE_NEW, ref = "White") +
+ + relevel(OTHERCPT_group, ref = "NA NA NA") + OPTIME + WNDCLAS+ relevel(PRSODM, ref = "upper")
+ + allq$SMOKE + allq$HYPERMED + allq$WNDINF
+ + relevel(DYSPNEA, ref = "No") + relevel(allq$DIABETES, ref = "NO") + ASACLAS + relevel(allq$PRHCT, ref = "more than 40%")
+ , family = binomial(link = logit), data = allq)
> summary(a)
Call:
glm(formula = TOTHLOS ~ age + SEX + relevel(RACE_NEW, ref = "White") +
+relevel(OTHERCPT_group, ref = "NA NA NA") + OPTIME + WNDCLAS +
relevel(PRSODM, ref = "upper") + allq$SMOKE + allq$HYPERMED +
allq$WNDINF + relevel(DYSPNEA, ref = "No") + relevel(allq$DIABETES,
ref = "NO") + ASACLAS + relevel(allq$PRHCT, ref = "more than 40%"),
family = binomial(link = logit), data = allq)
Pr(>|z|)
(Intercept) 0.00011 ***
age21-25 0.74438
age26-40 0.46263
age>40 0.38121
SEXmale 0.34529
relevel(RACE_NEW, ref = "White")American Indian or Alaska Native 0.62249
relevel(RACE_NEW, ref = "White")Asian 0.05637 .
relevel(RACE_NEW, ref = "White")Black or African American 0.23644
relevel(RACE_NEW, ref = "White")Native Hawaiian or Pacific Islander 0.73275
relevel(RACE_NEW, ref = "White")Unknown 0.50956
relevel(OTHERCPT_group, ref = "NA NA NA")1 2 3 0.04169 *
relevel(OTHERCPT_group, ref = "NA NA NA")1 2 NA 0.75835
relevel(OTHERCPT_group, ref = "NA NA NA")1 NA NA 0.56602
OPTIME120-270 0.03419 *
OPTIMEmore than 270 1.11e-11 ***
WNDCLAS2-Clean/Contaminated 0.52663
WNDCLAS3-Contaminated 0.84918
WNDCLAS4-Dirty/Infected 0.82255
relevel(PRSODM, ref = "upper")lower 0.10154
allq$SMOKEYes 0.99345
allq$HYPERMEDYes 0.26749
allq$WNDINFYes 0.37926
relevel(DYSPNEA, ref = "No")MODERATE EXERTION 0.76996
relevel(allq$DIABETES, ref = "NO")INSULIN 0.44470
relevel(allq$DIABETES, ref = "NO")NON-INSULIN 0.91492
ASACLAS2-Mild Disturb 0.16185
ASACLAS3-Severe Disturb 4.51e-06 ***
ASACLAS4-Life Threat 0.08671 .
relevel(allq$PRHCT, ref = "more than 40%")Less than 30% 0.33039
relevel(allq$PRHCT, ref = "more than 40%")30-40% 0.21663
> a<-glm(TOTHLOS ~ age + SEX + relevel(RACE_NEW, ref = "White")
+ + relevel(OTHERCPT_group, ref = "NA NA NA") + WNDCLAS
+ + allq$SMOKE + allq$HYPERMED + allq$WNDINF
+ + relevel(DYSPNEA, ref = "No") + relevel(allq$DIABETES, ref = "NO")
+
+ , family = binomial(link = logit), data = allq)
> summary(a)
Call:
glm(formula = TOTHLOS ~ age + SEX + relevel(RACE_NEW, ref = "White") +
relevel(OTHERCPT_group, ref = "NA NA NA") + WNDCLAS + allq$SMOKE +
allq$HYPERMED + allq$WNDINF + relevel(DYSPNEA, ref = "No") +
relevel(allq$DIABETES, ref = "NO"), family = binomial(link = logit),
data = allq)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.0482 -0.7535 -0.5747 0.8175 2.3258
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.12770 0.29659 -3.802 0.000143 ***
age21-25 -0.24989 0.21445 -1.165 0.243913
age26-40 -0.19453 0.20410 -0.953 0.340550
age>40 0.76249 0.20238 3.768 0.000165 ***
SEXmale 0.23176 0.13600 1.704 0.088359 .
relevel(RACE_NEW, ref = "White")American Indian or Alaska Native -0.03884 0.78676 -0.049 0.960628
relevel(RACE_NEW, ref = "White")Asian -0.66830 0.30019 -2.226 0.025998 *
relevel(RACE_NEW, ref = "White")Black or African American -0.06473 0.25987 -0.249 0.803295
relevel(RACE_NEW, ref = "White")Native Hawaiian or Pacific Islander 0.47777 0.81317 0.588 0.556837
relevel(RACE_NEW, ref = "White")Unknown -0.09488 0.17697 -0.536 0.591878
relevel(OTHERCPT_group, ref = "NA NA NA")1 2 3 1.60320 0.19818 8.090 5.99e-16 ***
relevel(OTHERCPT_group, ref = "NA NA NA")1 2 NA 0.69963 0.19148 3.654 0.000258 ***
relevel(OTHERCPT_group, ref = "NA NA NA")1 NA NA 0.14957 0.17411 0.859 0.390304
WNDCLAS2-Clean/Contaminated -0.58958 0.24428 -2.414 0.015797 *
WNDCLAS3-Contaminated -0.55435 0.52177 -1.062 0.288035
WNDCLAS4-Dirty/Infected 0.35967 0.61662 0.583 0.559699
allq$SMOKEYes 0.23343 0.21361 1.093 0.274482
allq$HYPERMEDYes 0.27592 0.23216 1.188 0.234637
allq$WNDINFYes 1.23271 0.57907 2.129 0.033273 *
relevel(DYSPNEA, ref = "No")MODERATE EXERTION 1.71415 0.76200 2.250 0.024479 *
relevel(allq$DIABETES, ref = "NO")INSULIN 1.11035 0.69261 1.603 0.108903
relevel(allq$DIABETES, ref = "NO")NON-INSULIN 0.57316 0.44634 1.284 0.199096
> car::vif(a)
GVIF Df GVIF^(1/(2*Df))
age 1.707034 3 1.093219
SEX 1.306244 1 1.142910
relevel(RACE_NEW, ref = "White") 1.410674 5 1.035006
relevel(OTHERCPT_group, ref = "NA NA NA") 1.394826 3 1.057028
WNDCLAS 1.392669 3 1.056756
allq$SMOKE 1.135153 1 1.065436
allq$HYPERMED 1.495988 1 1.223106
allq$WNDINF 1.122629 1 1.059542
relevel(DYSPNEA, ref = "No") 1.046603 1 1.023036
relevel(allq$DIABETES, ref = "NO") 1.448180 2 1.096998
relevel(PRSODM, ref = "upper") 1.204106 1 1.097318
ASACLAS 2.123131 3 1.133695
relevel(allq$PRHCT, ref = "more than 40%") 1.552441 2 1.116230
OPTIME 1.321785 2 1.072236