I want to calculate a Vaccine Effectiveness (Influenza) using Hazard Ratios from a Cox PH model. Methods are similar to this article : https://www.ncbi.nlm.nih.gov/pubmed/27813473
I am not so good in mathematics, I have a medical background and I am a R user with statistics basis.
I am using real world data (drug sales). The vaccination variable = vaccine sale 14 days before event The event is a basket sale combining drugs like paracetamol + anti-cough (unspecific but highly correlated to flu epidemic).
Here is the ouput :
> mod1 <- coxph(
> Surv(db3$time, db3$panier) ~ sex + vaccination + age_class,
> data = db3
> )
> summary(mod1)
Call:
coxph(formula = Surv(db3$time, db3$panier) ~ sex + vaccination +
age_class, data = db3)
n= 2437352, number of events= 211732
coef exp(coef) se(coef) z Pr(>|z|)
sex1 -0.132444 0.875952 0.004624 -28.64 <2e-16 ***
vaccination -0.405701 0.666509 0.006545 -61.99 <2e-16 ***
age_class(10,20] -0.407904 0.665043 0.014687 -27.77 <2e-16 ***
age_class(20,30] -0.903083 0.405318 0.012527 -72.09 <2e-16 ***
age_class(30,40] -0.948397 0.387362 0.011552 -82.10 <2e-16 ***
age_class(40,50] -1.045602 0.351480 0.011140 -93.86 <2e-16 ***
age_class(50,60] -1.134455 0.321597 0.010823 -104.82 <2e-16 ***
age_class(60,70] -1.303770 0.271506 0.010889 -119.73 <2e-16 ***
age_class(70,80] -1.415494 0.242806 0.011778 -120.18 <2e-16 ***
age_class(80,90] -1.615479 0.198796 0.013093 -123.39 <2e-16 ***
age_class(90,100] -2.024649 0.132040 0.023573 -85.89 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
sex1 0.8760 1.142 0.8680 0.8839
vaccination 0.6665 1.500 0.6580 0.6751
age_class(10,20] 0.6650 1.504 0.6462 0.6845
age_class(20,30] 0.4053 2.467 0.3955 0.4154
age_class(30,40] 0.3874 2.582 0.3787 0.3962
age_class(40,50] 0.3515 2.845 0.3439 0.3592
age_class(50,60] 0.3216 3.109 0.3148 0.3285
age_class(60,70] 0.2715 3.683 0.2658 0.2774
age_class(70,80] 0.2428 4.119 0.2373 0.2485
age_class(80,90] 0.1988 5.030 0.1938 0.2040
age_class(90,100] 0.1320 7.573 0.1261 0.1383
Concordance= 0.619 (se = 0.001 )
Rsquare= 0.016 (max possible= 0.92 )
Likelihood ratio test= 40343 on 11 df, p=0
Wald test = 42384 on 11 df, p=0
Score (logrank) test = 47349 on 11 df, p=0
Validation
> validation <- cox.zph(mod1)
> validation
rho chisq p
sex1 -0.028494 1.72e+02 0.00e+00
vaccination 0.210335 9.30e+03 0.00e+00
age_class(10,20] 0.009616 1.96e+01 9.65e-06
age_class(20,30] -0.005949 7.50e+00 6.18e-03
age_class(30,40] -0.004818 4.92e+00 2.66e-02
age_class(40,50] -0.000899 1.71e-01 6.79e-01
age_class(50,60] 0.007783 1.28e+01 3.45e-04
age_class(60,70] 0.001600 5.40e-01 4.62e-01
age_class(70,80] -0.013171 3.66e+01 1.47e-09
age_class(80,90] -0.021582 9.90e+01 0.00e+00
age_class(90,100] -0.018040 6.92e+01 1.11e-16
GLOBAL NA 1.05e+04 0.00e+00
My question
How can I validate this model ? I have read that p-values can be messy when using large datasets... The cox.zph
method with variables > 0.05 is not an option here.
I tested the model with the variable sex, I cant validate the model either.
I want to know if I can use the Hazard ratio despite the fact I cant validate the model.
Or maybe I have to use graphical methods ?
Thank you for your help