2

I have just coded a Cox non proportional regression coxph(formula = Surv(TIME, TIME2, DEL) ~ SCORE10K + Z, data = WRDS) where SCORE10K and Z are two time dependent variables which should predict bankruptcy. However when I check for assumption violations, I get this strange output. SCORE10K and Z are decimal variables.

summary(viol.cox)
          Length Class  Mode     
table      9     -none- numeric  
x         25     -none- numeric  
y         50     -none- numeric  
var        4     -none- numeric  
call       2     -none- call     
transform  1     -none- character

> cox.zph(model.coxph)
         rho chisq   p
SCORE10K  NA   NaN NaN
Z         NA   NaN NaN
GLOBAL    NA   NaN NaN
Warning message:
In cor(xx, r2) : the standard deviation is zero
Xi'an
  • 90,397
  • 9
  • 157
  • 575
mariapena
  • 41
  • 5
  • Also I would like to add that I think this is due to the fact that SCORE10K and Z both have quite a few NAs in the dataset. But I still wouldn't know how to fix the problem even if this is the reason. – mariapena Nov 30 '16 at 10:03
  • So I dont think that NAs are the problem after all, because I just checked the hazard model with another time-dependent variable that has no NAs (or very few), remove the Z score, and still I get the same output with the cox.zph function. – mariapena Nov 30 '16 at 11:15
  • This is off-topic here but if you repost on a more appropriate site you will need to give a reproducible example. – mdewey Dec 01 '16 at 09:49
  • @mdewey I am not sure I understand why this is off-topic? But you are a statistician from what I see! Any chance you would know how I should approach this modelling issue? I am really confused about what model to turn to if a hazard model doesnt work with my data set. See below. – mariapena Dec 01 '16 at 10:22
  • Because it is about interpreting an R error message not about a statistical problem. If you find why it gives the error message and can then tell us what feature of your data is causing the problem then there may be a statistical question behind it. – mdewey Dec 01 '16 at 10:29
  • @mdewey indeed! But it seems Yuval figured that the warning comes from the fact that all the Y=1 events (bankruptcy) take place at the same moment (if they take place), t=4. Hence there is no variance in the dataset, as all the 1s are at t=4 only. So the statistical problem of no variance is due to the disposition of the dataset. however I cannot add datapoints.. the data gathering (annual reports) was a very extensive process and now I need to find the appropriate model. – mariapena Dec 01 '16 at 10:35
  • Perhaps now is the moment to edit some of this extra information into your question and ask what courses of action are open to you? Unless you already have something worked out. – mdewey Dec 01 '16 at 12:20
  • I just have, @mdewey, please confer to my initial question that is also mentioned by Yuval in his answer here below. Thanks for your help again! – mariapena Dec 01 '16 at 15:27

1 Answers1

0

Looking at your other question for information on the data, I think I know the problem. If indeed all of the events occur on the 4th spell (time-period), than you have no variance.

For example, in this dummy dara.frame there are 4 individuals, 2 experiencing the event at the same spell (3rd):

testDS <- data.frame(id=c(1,1,1,2,2,2,3,3,3,4,4,4), t_start=c(1,2,3,1,2,3,1,2,3,1,2,3), 
                     event=c(0,0,0,0,0,0,0,0,1,0,0,1), ind1=c(10,12,15,9,5,11,10,12,30,21,21,27))
testDS$t_end <- testDS$t_start+1
(cox.zph(coxph(Surv(t_start, t_end, event)~ind1, data=testDS)))

gets:

> (cox.zph(test_cox))
     rho chisq   p
ind1  NA   NaN NaN
Warning message:
In cor(xx, r2) : the standard deviation is zero

Yet when I add a 5th individual who experienced the event at the 2nd spell:

testDS1 <- data.frame(id=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5), t_start=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2), 
                     event=c(0,0,0,0,0,0,0,0,1,0,0,1,0,1), ind1=c(10,12,15,9,5,11,10,12,30,21,21,27,13,40))
testDS1$t_end <- testDS1$t_start+1

I get no warnings:

    > (cox.zph(test_cox))
        rho   chisq     p
ind1 -0.114 0.00835 0.927

Point is, a Cox hazard model might not be the right one for your needs.

Yuval Spiegler
  • 1,821
  • 1
  • 15
  • 31
  • Thank you Yuval. However my thesis supervisor made it clear that hazard modelling is the best option with this dataset. Is it just the specifications of the Cox model that make it impossible to estimate the violation of assumptions? Or do I need to re-structure my data? The Thermeau paper you shared with me shows that my data disposition should be fine... – mariapena Dec 01 '16 at 09:46
  • Hmm.. this might be better answered by one of the statisticians here. I'll try to find something about it also. – Yuval Spiegler Dec 01 '16 at 10:10
  • I have still not found the solution to this data disposition problem... Any chance you found something? – mariapena Dec 02 '16 at 08:47