2

I am trying to manually calculate the schoenfeld residuals in Cox model. Please see the code below. Sresid gives the calculation in cox.zph function using schoenfeld residuals. Schoenfeld gives the result in r when based on own code. These two output are different. How can I make sresid=Schoenfeld?

> rm(list=ls()); set.seed(1234)
> library("survival")
> hmohiv<-read.table("https://stats.idre.ucla.edu/stat/r/examples/asa/hmohiv.csv", sep=",", header = TRUE)
> attach(hmohiv)
> #______Schoenfeld Residuals from coxph()__________
> res.cox<-coxph( Surv(time, censor)~drug+age)
> sresid <- residuals(res.cox, "schoenfeld")
> #______________Own code__________
> beta<-as.matrix( c (res.cox$coefficients[1], res.cox$coefficients[2]) )
> newdata <- hmohiv[ which(censor==1), ]
> OrderedData <- newdata[order(newdata$time), ]; #order the data according to the event times
> x_ordered<-as.matrix(OrderedData[,3:4]) #Consider x1 and x2.
> n<-80 #length of newdata;
> Numerator1<-numeric(n);Numerator2<-  numeric(n);Denominator<-numeric(n)
> Num1<-  numeric(n);Num2<-  numeric(n);Den<-numeric(n)
> for(j in 1:n) { 
  Numerator1[j] <- x_ordered[j,1]*exp( x_ordered[j,]%*%beta) 
  Numerator2[j] <- x_ordered[j,2]*exp( x_ordered[j,]%*%beta)     
  Denominator[j]<-exp( x_ordered[j,] %*%beta) }
> for(j in 1:n) {
  Num1[j] <- sum(Numerator1[j:n])
   Num2[j]<- sum(Numerator2[j:n])
   Den[j] <- sum(Denominator[j:n]) }
> Schoenfeld<-as.matrix(cbind( (x_ordered[,1]-Num1/Den), (x_ordered[,2]-Num2/Den))  )
> head(Schoenfeld) #Residuals based on own code
         [,1]        [,2]
13  -9.755948  0.98106142
16 -17.756333 -0.01889986
28  -6.756333  0.98110014
32 -12.761970 -0.01808142
52 -22.761991 -0.01808145
54  -3.761991  0.98191855
> head(sresid) #Residuals based on res.cox
       drug       age
1  0.300015  4.090062
1 -0.699985 -3.909938
1  0.300015  7.090062
1 -0.699985  1.090062
1 -0.699985 -8.909938
1  0.300015 10.090062

1 Answers1

3

Ok, to start with you can tell who is wrong by looking at sums. The Schoenfeld residuals add up to the partial likelihood score, which is zero by construction at $\beta=\hat\beta$.

> colSums(Schoenfeld)
[1] -936.12129   36.28693
> colSums(sresid)
         drug           age 
 2.373102e-15 -2.877698e-13 

I'm going to try to make minimal changes to your code; this is not how I would have done the coding.

Ok. One problem is this

newdata <- hmohiv[ which(censor==1), ]
OrderedData <- newdata[order(newdata$time), ]; #order the data according to the event times
x_ordered<-as.matrix(OrderedData[,3:4]) #Consider x1 and x2.

There is only a Schoenfeld residual at event times, but Numerator1 and Numerator2 and Denominator need to use all individuals in the risk set, including people who will be censored in the future, who you are leaving out.

A second problem is this

> head(x_ordered)
   age drug
13  44    1
16  36    0
28  47    1
32  41    0
52  31    0
54  50    1
> beta
           [,1]
drug 1.01669835
age  0.09714082

You've got the variables around the wrong way.

Fixing these, we get

>  head(Schoenfeldr) #Residuals based on own code
         [,1]      [,2]
13  0.2818468  3.740749
16 -0.7106238 -4.159318
28  0.2862046  6.822119
32 -0.7031342  1.076245
52 -0.7084648 -8.915596
54  0.2894962 10.058744

which is still wrong but a lot closer.

Next, ties:

> length(unique(hmohiv$time))
[1] 31
> sum(hmohiv$censor)
[1] 80

Your Numerator and Denominator should be constant for observations at the same time, but they aren't. This means that if you change the starting order for hmohiv, your Schoenfeld residuals change but the built-in ones don't.

Fixing this gives us

> head(Schoenfeldr)
         [,1]       [,2]
13  0.2818468  3.7407492
16 -0.7181532 -4.2592508
28  0.2818468  6.7407492
32 -0.7181532  0.7407492
52 -0.7181532 -9.2592508
54  0.2818468  9.7407492
> head(sresid)
       drug       age
1  0.300015  4.090062
1 -0.699985 -3.909938
1  0.300015  7.090062
1 -0.699985  1.090062
1 -0.699985 -8.909938
1  0.300015 10.090062

That doesn't quite work. The next issue is more subtle, but again is ties.

The formula for the Schoenfeld residuals is more complicated with the default Efron approximation to ties, because the approximation changes how the numerator and denominator are calculated. If we switch to the Breslow approximation, we do get exact agreement:

rm(list=ls()); set.seed(1234)
 library("survival")
 hmohiv<-read.table("https://stats.idre.ucla.edu/stat/r/examples/asa/hmohiv.csv", sep=",", header = TRUE)
 #______Schoenfeld Residuals from coxph()__________
 res.cox<-coxph( Surv(time, censor)~drug+age, data=hmohiv,ties="breslow")
 sresid <- residuals(res.cox, "schoenfeld")
 #______________Own code__________
 beta<-as.matrix( c (res.cox$coefficients[1], res.cox$coefficients[2]) )
 OrderedData <- hmohiv[order(hmohiv$time), ] #order the data according to the event times
 x_ordered<-as.matrix(OrderedData[ ,4:3]) #Consider x1 and x2.
 n<-nrow(OrderedData)
 Numerator1<-numeric(n);Numerator2<-  numeric(n);Denominator<-numeric(n)
 Num1<-  numeric(n);Num2<-  numeric(n);Den<-numeric(n)
 for(j in 1:n) { 
  Numerator1[j] <- x_ordered[j,1]*exp( x_ordered[j,]%*%beta) 
  Numerator2[j] <- x_ordered[j,2]*exp( x_ordered[j,]%*%beta)     
  Denominator[j]<-exp( x_ordered[j,] %*%beta) 
 }
 
 for(j in 1:n) {
   thistime<-OrderedData$time[j]
   riskset <- which(thistime<=OrderedData$time)
   Num1[j] <- sum(Numerator1[riskset])
   Num2[j]<- sum(Numerator2[riskset])
   Den[j] <- sum(Denominator[riskset]) 
 }
 
Schoenfeld<-as.matrix(cbind( (x_ordered[,1]-Num1/Den), (x_ordered[,2]-Num2/Den))  )
Schoenfeldr<- Schoenfeld[OrderedData$censor==1, ]

Now we get

>  head(Schoenfeldr) #Residuals based on own code
         [,1]       [,2]
13  0.2957278  3.9555414
16 -0.7042722 -4.0444586
28  0.2957278  6.9555414
32 -0.7042722  0.9555414
52 -0.7042722 -9.0444586
54  0.2957278  9.9555414
>  head(sresid) #Residuals based on res.cox
        drug        age
1  0.2957278  3.9555414
1 -0.7042722 -4.0444586
1  0.2957278  6.9555414
1 -0.7042722  0.9555414
1 -0.7042722 -9.0444586
1  0.2957278  9.9555414

PS: attach() is increasingly unpopular these days, for good reason.

Thomas Lumley
  • 21,784
  • 1
  • 22
  • 73
  • Thank you for the well detailed answer. However, I still get different Chi-square values for own code as compared to the cox.zph() function. https://stats.stackexchange.com/questions/531481/computing-chi-square-values-from-schoenfeld-residuals – John Majimboni Jun 20 '21 at 20:36