survival analysis using unbalanced sample

Question

I am new to survival analysis. Below is my data with very unbalanced sample size (treat group has 2 samples with 1 event, 1 censored and control group has 700+ samples). I use Cox regression in 'survival' package in R and results show 3 different tests (likelihood ratio test, log rank test and Wald test).

sample   trt    censor time
A7       TRT     0 1.0219178
BH       TRT     1 0.6136986
SB        C      0 0.7095890
SD        C      0 1.1972603
SE        C      0 3.6191781
..       ..     ..  ..
A1        C      0 4.0082192

My code:

coxph(Surv(time,censor)~trt, data=dataAll)

Result:

> coxfit
Call:
coxph(formula = Surv(time, censor) ~ trt, data = dataAll)

  n= 772, number of events= 100 

                 coef exp(coef) se(coef)      z Pr(>|z|)    
trtC -3.80047   0.02236  1.04854 -3.625 0.000289 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

             exp(coef) exp(-coef) lower .95 upper .95
trtC   0.02236      44.72  0.002864    0.1746

Concordance= 0.513  (se = 0.002 )
Rsquare= 0.007   (max possible= 0.73 )
Likelihood ratio test= 5.55  on 1 df,   p=0.01845
Wald test            = 13.14  on 1 df,   p=0.0002895
Score (logrank) test = 38.85  on 1 df,   p=4.579e-10

My questions are:

There are 3 tests giving different p values, and they look quite different with the likelihood ratio test the most conservative. Do they all test for the significant of the Cox coefficient? Which one should I choose?
Give the fact that the treatment group has so few samples, could the p value trustable?
Is it appropriate to apply Cox regression to unbalanced sample? If no, is there any alternative methods?

Thanks a lot!

J

score 4 · Answer 1 · edited Apr 13 '17 at 12:44

The differences between the 3 tests are nicely explained on this Cross Validated page in the related context of logistic regression, with a link to further information. Simply put, as the sample size increases the 3 tests will converge to the same result, but you might need a really large sample size for that to occur. The answer on that page suggests that the likelihood-ratio test is often considered the "best."
Would you trust any result that depended on only 1 or 2 cases? That's what you have here. The Cox regression is based on covariate values for cases that are still at risk (not yet censored) at each event time. There is only 1 event in the TRT group (which seems to be very early in time), and the 1 censored case in the TRT group provides no information for events occurring after its censoring time of 1.02 units, which also might be earlier than most of the control case times.
"Appropriate" is a difficult word in this context. Rather than focus on a statistical test per se, you should probably pay more attention to the underlying subject matter, in particular why there is such an imbalance between the 2 groups and if there were any other considerations that might have accounted for the apparently very early event in the TRT group. With only 1 event in the TRT group you cannot test the proportional-hazards assumption that underlies the Cox regression. With only 2 groups and no other covariates you effectively have already done the non-parametric equivalent of the Cox regression, the log-rank test, which is in this case equivalent to the Cox regression score test but can allow "significant results for survivorship prediction models that have low accuracy." The test might be "appropriate"; the question is interpretation of the result.

Todd D · Answer 2 · 2020-07-24T14:11:57.090

Regarding the last two questions, I would suggest that the size of the study is generally good, but the unbalanced nature creates significant problems with power.

We can calculate a retrospective power based on the data you provide using R:

#Retrospective power for Cox model

install.packages("powerSurvEpi")
require("powerSurvEpi")

powerCT.default0(k=(2/770),m=100,RR=0.022, alpha=0.05)

Which results in a power of 0.072 or ~7%.

To demonstrate the effect of your study size and the effect of having a balanced sample and the effect size you detect, set the treatment allocation to 1:1 and keep all other aspects the same:

powerCT.default0(k= 1,m=100,RR=0.022, alpha=0.05)

Giving a power of 1.0 or ~100%

This demonstrates the effect of the unbalanced sample. Relaxing the treatment effect (hazard ratio) will only show apparent worsening of power as detecting an ever smaller difference from the null requires increasingly more events:

powerCT.default0(k=(2/770),m=100,RR=0.5, alpha=0.05)

Resulting in power of 0.044 or ~4%.

We can conclude that your unbalanced sample is a threat to validity of the Cox method. There is no method I can think of that will handle this imbalance well. Even a simple Fisher's exact test suggests a p-value of 0.24 using the treatment and event groups as you describe.

Why is there used 2/772 in k ? I understand that there are 2 observations with treatment, but shouldnt the denominator be only controls, i.e. 770 ? Maybe it does not make a big difference in this example, but in the one with more samples with treatment it is crucial. Thanks. — pikachu, Jul 22 '20 at 08:16
You are correct. I have changed the code. This change did not alter the results, but is now correct for reference. — Todd D, Jul 24 '20 at 14:12

survival analysis using unbalanced sample

2 Answers2

Linked