These multiple imputation results relate to data I have previously described and shown here - Skewed Distributions for Logistic Regression
Three variables I am using have missing data. Their names, descriptions and % missing are shown below.
inctoCran - Time from head injury to craniotomy in minutes = 0-2880 (After 2880 minutes is defined as a separate diagnosis) - 58% missing
GCS - Glasgow Coma Scale = 3-15 - 37% missing
rcteyemi - Pupil reactivity (1 = neither, 2 = one, 3 = both) - 56% missing
I have been using mutliple imputation to model the missing data above following advice in a previous post here - Describing Results from Logistic Regression with Restricted Cubic Splines Using rms in R
Given this is a longitudinal analysis, a key variable of importance is the year of the treatment so we can investigate how our patient management has improved. The variable in question, Yeardecimal
is highly significant in univariate analysis:
> rcs.ASDH<-lrm(formula = Survive ~ Yeardecimalc, data = ASDH_Paper1.1)
>
> rcs.ASDH
Logistic Regression Model
lrm(formula = Survive ~ Yeardecimalc, data = ASDH_Paper1.1)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 5998 LR chi2 91.47 R2 0.023 C 0.572
0 1281 d.f. 1 g 0.309 Dxy 0.143
1 4717 Pr(> chi2) <0.0001 gr 1.362 gamma 0.146
max |deriv| 3e-12 gp 0.054 tau-a 0.048
Brier 0.165
Coef S.E. Wald Z Pr(>|Z|)
Intercept 0.8696 0.0530 16.42 <0.0001
Yeardecimalc 0.0551 0.0057 9.70 <0.0001
To deal with missingness, I used aregImpute
and fit.mult.impute
to conduct multiple imputation prior to multivariate logisic regression. When including Yeardecimal, the results were as follows:
> a <- aregImpute(~ I(Outcome30) + Age + GCS + I(Other) + ISS + inctoCran + I(rcteyemi) + I(neuroFirst) + I(neuroYN) + Mechanism + LOS + Yeardecimalc, nk=4, data = ASDH_Paper1.1, n.impute=10)
Iteration 13
>
> a
Multiple Imputation using Bootstrap and PMM
aregImpute(formula = ~I(Outcome30) + Age + GCS + I(Other) + ISS +
inctoCran + I(rcteyemi) + I(neuroFirst) + I(neuroYN) + Mechanism +
LOS + Yeardecimalc, data = ASDH_Paper1.1, n.impute = 10,
nk = 4)
n: 5998 p: 12 Imputations: 10 nk: 4
Number of NAs:
Outcome30 Age GCS Other ISS inctoCran rcteyemi neuroFirst neuroYN
0 0 2242 0 0 3500 3376 0 0
Mechanism LOS Yeardecimalc
0 0 0
type d.f.
Outcome30 c 1
Age s 3
GCS s 3
Other c 1
ISS s 3
inctoCran s 3
rcteyemi l 1
neuroFirst l 1
neuroYN l 1
Mechanism c 4
LOS s 3
Yeardecimalc s 3
Transformation of Target Variables Forced to be Linear
R-squares for Predicting Non-Missing Values for Each Variable
Using Last Imputations of Predictors
GCS inctoCran rcteyemi
0.421 0.181 0.358
> rcs.ASDH <- fit.mult.impute(Survive ~ rcs(Age) + GCS + Mechanism + rcs(ISS) + neuroFirst + rcs(inctoCrand) + inctoCranYN + rcs(Yeardecimalc) + Sex + Other + rcteyemi,lrm,a,data=ASDH_Paper1.1)
> rcs.ASDH
Logistic Regression Model
fit.mult.impute(formula = Survive ~ rcs(Age) + GCS + Mechanism +
rcs(ISS) + neuroFirst + rcs(inctoCrand) + inctoCranYN + rcs(Yeardecimalc) +
Sex + Other + rcteyemi, fitter = lrm, xtrans = a, data = ASDH_Paper1.1)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 5998 LR chi2 1609.98 R2 0.365 C 0.836
0 1281 d.f. 25 g 1.584 Dxy 0.672
1 4717 Pr(> chi2) <0.0001 gr 4.875 gamma 0.674
max |deriv| 0.001 gp 0.222 tau-a 0.226
Brier 0.121
Coef S.E. Wald Z Pr(>|Z|)
Intercept 21.3339 67.4400 0.32 0.7517
Age -0.0088 0.0132 -0.67 0.5052
Age' -0.0294 0.0643 -0.46 0.6471
Age'' -0.0134 0.2479 -0.05 0.9570
Age''' 0.2588 0.3534 0.73 0.4639
GCS 0.1100 0.0145 7.61 <0.0001
Mechanism=Fall > 2m -0.0651 0.1162 -0.56 0.5754
Mechanism=Other 0.2285 0.1338 1.71 0.0876
Mechanism=RTC 0.0449 0.1332 0.34 0.7360
Mechanism=Shooting / Stabbing 2.1150 1.1142 1.90 0.0577
ISS -0.1069 0.0318 -3.36 0.0008
ISS' -0.0359 0.1306 -0.27 0.7835
ISS'' 1.8296 1.9259 0.95 0.3421
neuroFirst -0.3483 0.0973 -3.58 0.0003
inctoCrand 0.0001 0.0053 0.02 0.9872
inctoCrand' -0.0745 0.3060 -0.24 0.8077
inctoCrand'' 0.1696 0.5901 0.29 0.7738
inctoCrand''' -0.1167 0.3150 -0.37 0.7110
inctoCranYN -0.2814 0.6165 -0.46 0.6480
Yeardecimalc -0.0101 0.0337 -0.30 0.7641
Yeardecimalc' 0.0386 0.0651 0.59 0.5536
Yeardecimalc'' -0.7417 0.8210 -0.90 0.3663
Yeardecimalc''' 7.0367 4.9344 1.43 0.1539
Sex=Male 0.0668 0.0891 0.75 0.4534
Other=1 0.3238 0.1611 2.01 0.0445
rcteyemi 1.1589 0.1050 11.04 <0.0001
> anova(rcs.ASDH)
Wald Statistics Response: Survive
Factor Chi-Square d.f. P
Age 83.07 4 <.0001
Nonlinear 5.97 3 0.1131
GCS 57.89 1 <.0001
Mechanism 8.14 4 0.0867
ISS 77.31 3 <.0001
Nonlinear 35.04 2 <.0001
neuroFirst 12.81 1 0.0003
inctoCrand 2.32 4 0.6777
Nonlinear 2.29 3 0.5149
inctoCranYN 0.21 1 0.6480
Yeardecimalc 4.19 4 0.3807
Nonlinear 3.77 3 0.2874
Sex 0.56 1 0.4534
Other 4.04 1 0.0445
rcteyemi 121.80 1 <.0001
TOTAL NONLINEAR 47.27 11 <.0001
TOTAL 679.09 25 <.0001
>
Yeardecimal is no longer significant. However, if I exclude Yeardecimal from aregImpute only, I have the alternative result below:
> a <- aregImpute(~ I(Outcome30) + Age + GCS + I(Other) + ISS + inctoCran + I(rcteyemi) + I(neuroFirst) + I(neuroYN) + Mechanism + LOS, nk=4, data = ASDH_Paper1.1, n.impute=10)
Iteration 13
>
> a
Multiple Imputation using Bootstrap and PMM
aregImpute(formula = ~I(Outcome30) + Age + GCS + I(Other) + ISS +
inctoCran + I(rcteyemi) + I(neuroFirst) + I(neuroYN) + Mechanism +
LOS, data = ASDH_Paper1.1, n.impute = 10, nk = 4)
n: 5998 p: 11 Imputations: 10 nk: 4
Number of NAs:
Outcome30 Age GCS Other ISS inctoCran rcteyemi neuroFirst neuroYN Mechanism LOS
0 0 2242 0 0 3500 3376 0 0 0 0
type d.f.
Outcome30 c 1
Age s 3
GCS s 3
Other c 1
ISS s 3
inctoCran s 3
rcteyemi l 1
neuroFirst l 1
neuroYN l 1
Mechanism c 4
LOS s 3
Transformation of Target Variables Forced to be Linear
R-squares for Predicting Non-Missing Values for Each Variable
Using Last Imputations of Predictors
GCS inctoCran rcteyemi
0.407 0.194 0.320
>
> rcs.ASDH <- fit.mult.impute(Survive ~ rcs(Age) + GCS + Mechanism + rcs(ISS) + neuroFirst + rcs(inctoCrand) + inctoCranYN + rcs(Yeardecimalc) + Sex + Other + rcteyemi,lrm,a,data=ASDH_Paper1.1)
> rcs.ASDH
Logistic Regression Model
fit.mult.impute(formula = Survive ~ rcs(Age) + GCS + Mechanism +
rcs(ISS) + neuroFirst + rcs(inctoCrand) + inctoCranYN + rcs(Yeardecimalc) +
Sex + Other + rcteyemi, fitter = lrm, xtrans = a, data = ASDH_Paper1.1)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 5998 LR chi2 1607.92 R2 0.364 C 0.834
0 1281 d.f. 25 g 1.578 Dxy 0.667
1 4717 Pr(> chi2) <0.0001 gr 4.846 gamma 0.669
max |deriv| 0.003 gp 0.221 tau-a 0.224
Brier 0.120
Coef S.E. Wald Z Pr(>|Z|)
Intercept -55.6574 58.3464 -0.95 0.3401
Age -0.0084 0.0128 -0.66 0.5105
Age' -0.0335 0.0612 -0.55 0.5838
Age'' 0.0050 0.2365 0.02 0.9830
Age''' 0.2321 0.3387 0.69 0.4930
GCS 0.1099 0.0124 8.88 <0.0001
Mechanism=Fall > 2m -0.0631 0.1138 -0.55 0.5793
Mechanism=Other 0.2354 0.1381 1.70 0.0883
Mechanism=RTC 0.0315 0.1319 0.24 0.8114
Mechanism=Shooting / Stabbing 1.9297 1.0930 1.77 0.0775
ISS -0.1012 0.0335 -3.02 0.0025
ISS' -0.0599 0.1366 -0.44 0.6613
ISS'' 2.1581 2.0120 1.07 0.2834
neuroFirst -0.3753 0.0888 -4.23 <0.0001
inctoCrand -0.0007 0.0054 -0.13 0.9002
inctoCrand' -0.0496 0.3116 -0.16 0.8734
inctoCrand'' 0.1316 0.6021 0.22 0.8270
inctoCrand''' -0.1078 0.3224 -0.33 0.7381
inctoCranYN -0.1697 0.6172 -0.27 0.7834
Yeardecimalc 0.0281 0.0291 0.96 0.3349
Yeardecimalc' 0.0682 0.0600 1.14 0.2553
Yeardecimalc'' -1.4037 0.7685 -1.83 0.0678
Yeardecimalc''' 10.2513 4.8156 2.13 0.0333
Sex=Male 0.0595 0.0890 0.67 0.5037
Other=1 0.3579 0.1641 2.18 0.0292
rcteyemi 1.1862 0.0799 14.85 <0.0001
> anova(rcs.ASDH)
Wald Statistics Response: Survive
Factor Chi-Square d.f. P
Age 78.39 4 <.0001
Nonlinear 6.23 3 0.1011
GCS 78.86 1 <.0001
Mechanism 7.53 4 0.1104
ISS 76.46 3 <.0001
Nonlinear 31.16 2 <.0001
neuroFirst 17.87 1 <.0001
inctoCrand 3.22 4 0.5214
Nonlinear 3.19 3 0.3630
inctoCranYN 0.08 1 0.7834
Yeardecimalc 44.83 4 <.0001
Nonlinear 4.67 3 0.1979
Sex 0.45 1 0.5037
Other 4.76 1 0.0292
rcteyemi 220.51 1 <.0001
TOTAL NONLINEAR 45.39 11 <.0001
TOTAL 715.22 25 <.0001
>
Can anyone help me understand why the statistical results for Yeardecimal are so starkly different?