Ok, to start with you can tell who is wrong by looking at sums. The Schoenfeld residuals add up to the partial likelihood score, which is zero by construction at $\beta=\hat\beta$.
> colSums(Schoenfeld)
[1] -936.12129 36.28693
> colSums(sresid)
drug age
2.373102e-15 -2.877698e-13
I'm going to try to make minimal changes to your code; this is not how I would have done the coding.
Ok. One problem is this
newdata <- hmohiv[ which(censor==1), ]
OrderedData <- newdata[order(newdata$time), ]; #order the data according to the event times
x_ordered<-as.matrix(OrderedData[,3:4]) #Consider x1 and x2.
There is only a Schoenfeld residual at event times, but Numerator1
and Numerator2
and Denominator
need to use all individuals in the risk set, including people who will be censored in the future, who you are leaving out.
A second problem is this
> head(x_ordered)
age drug
13 44 1
16 36 0
28 47 1
32 41 0
52 31 0
54 50 1
> beta
[,1]
drug 1.01669835
age 0.09714082
You've got the variables around the wrong way.
Fixing these, we get
> head(Schoenfeldr) #Residuals based on own code
[,1] [,2]
13 0.2818468 3.740749
16 -0.7106238 -4.159318
28 0.2862046 6.822119
32 -0.7031342 1.076245
52 -0.7084648 -8.915596
54 0.2894962 10.058744
which is still wrong but a lot closer.
Next, ties:
> length(unique(hmohiv$time))
[1] 31
> sum(hmohiv$censor)
[1] 80
Your Numerator
and Denominator
should be constant for observations at the same time, but they aren't. This means that if you change the starting order for hmohiv
, your Schoenfeld residuals change but the built-in ones don't.
Fixing this gives us
> head(Schoenfeldr)
[,1] [,2]
13 0.2818468 3.7407492
16 -0.7181532 -4.2592508
28 0.2818468 6.7407492
32 -0.7181532 0.7407492
52 -0.7181532 -9.2592508
54 0.2818468 9.7407492
> head(sresid)
drug age
1 0.300015 4.090062
1 -0.699985 -3.909938
1 0.300015 7.090062
1 -0.699985 1.090062
1 -0.699985 -8.909938
1 0.300015 10.090062
That doesn't quite work. The next issue is more subtle, but again is ties.
The formula for the Schoenfeld residuals is more complicated with the default Efron approximation to ties, because the approximation changes how the numerator and denominator are calculated. If we switch to the Breslow approximation, we do get exact agreement:
rm(list=ls()); set.seed(1234)
library("survival")
hmohiv<-read.table("https://stats.idre.ucla.edu/stat/r/examples/asa/hmohiv.csv", sep=",", header = TRUE)
#______Schoenfeld Residuals from coxph()__________
res.cox<-coxph( Surv(time, censor)~drug+age, data=hmohiv,ties="breslow")
sresid <- residuals(res.cox, "schoenfeld")
#______________Own code__________
beta<-as.matrix( c (res.cox$coefficients[1], res.cox$coefficients[2]) )
OrderedData <- hmohiv[order(hmohiv$time), ] #order the data according to the event times
x_ordered<-as.matrix(OrderedData[ ,4:3]) #Consider x1 and x2.
n<-nrow(OrderedData)
Numerator1<-numeric(n);Numerator2<- numeric(n);Denominator<-numeric(n)
Num1<- numeric(n);Num2<- numeric(n);Den<-numeric(n)
for(j in 1:n) {
Numerator1[j] <- x_ordered[j,1]*exp( x_ordered[j,]%*%beta)
Numerator2[j] <- x_ordered[j,2]*exp( x_ordered[j,]%*%beta)
Denominator[j]<-exp( x_ordered[j,] %*%beta)
}
for(j in 1:n) {
thistime<-OrderedData$time[j]
riskset <- which(thistime<=OrderedData$time)
Num1[j] <- sum(Numerator1[riskset])
Num2[j]<- sum(Numerator2[riskset])
Den[j] <- sum(Denominator[riskset])
}
Schoenfeld<-as.matrix(cbind( (x_ordered[,1]-Num1/Den), (x_ordered[,2]-Num2/Den)) )
Schoenfeldr<- Schoenfeld[OrderedData$censor==1, ]
Now we get
> head(Schoenfeldr) #Residuals based on own code
[,1] [,2]
13 0.2957278 3.9555414
16 -0.7042722 -4.0444586
28 0.2957278 6.9555414
32 -0.7042722 0.9555414
52 -0.7042722 -9.0444586
54 0.2957278 9.9555414
> head(sresid) #Residuals based on res.cox
drug age
1 0.2957278 3.9555414
1 -0.7042722 -4.0444586
1 0.2957278 6.9555414
1 -0.7042722 0.9555414
1 -0.7042722 -9.0444586
1 0.2957278 9.9555414
PS: attach()
is increasingly unpopular these days, for good reason.