1

i developed a cox model for cancer overall survival with the rms-package. I use a with 765 observations (194 intern, 571 extern). So I divided in two datasets: Training and Validation

I get a correct Cox-Model. When I try the validation I get this error:

Variable length differ (found for 'N')
In addition: Warning message:'newdata' had 571 rows but variables found have 194 rows.

Can someone help me? :)

This is my code:

    #load packages 
library(survival)
library(rms)
library(dplyr)


#Define survival variables Zeit=time, Status=status

Zeit <- mydata$UEBERLEBEN
Status <- mydata$TOD



Alter <- mydata$KRANK_ALT
KOHORTE <- mydata$KOHORTE


#Define predcictor Variables

set.seed(88)
units(Zeit)= "Year"
label(Alter)= "Age at diagnosis"



T <- factor(mydata$HIGHEST_T,levels = 1:4, labels= c('T1', 'T2', 'T3', 'T4'))
N <- factor(mydata$N_STAGE_EINFACH,levels = 0:1, labels= c('N0', '>=N1'))
M <- factor(mydata$HIGHEST_M,levels =  0:1, labels=c('M0', 'M1'))
Rezidiv <-factor(mydata$REZIDIV,levels =  0:1, labels=c('NO', 'YES'))
Malig <- factor(mydata$MALIG,levels =  1:2, labels=c('Low GRADE', 'HIGH GRADE'))
Histo <- factor(mydata$HIST_WICHTIG,levels =  8:13, labels = c('Adeno-Ca', 'Acinic-Ca', 'ACC','MEC', 'PEC', 'other'))
Geschlecht <- factor(mydata$SEX, levels = 1:2, labels= c('female','male'))

#Define Survival object
S <- Surv(Zeit, Status==1)


#Define Datadist

dd<- datadist( Geschlecht,T,N,M, Alter, Rezidiv, Malig, Histo)
options(datadist='dd')


#build cox model (KOHORTE==0 means Training data... Training is coded by 0)

Cox <- cph(S ~ T + N +M + Alter + Histo , surv = TRUE, subset = KOHORTE==0,  time.inc = 5,  x=TRUE, y=TRUE)

print(Cox)

#Validation
#Build Validation data set ( Validation group is coded by 1)

Validation <- subset(mydata, KOHORTE==1)

#Define new Survival object for validation data set
ZeitV <- Validation$UEBERLEBEN
StatusV <- Validation$TOD
V <- Surv(ZeitV, StatusV)
Valid <-val.surv(Cox, newdata=Validation, S=V )
plot(Valid)
Chris
  • 11
  • 1
  • Note that Harrell would argue strongly against separate training and validation sets with such a small sample size. See [this answer](https://stats.stackexchange.com/a/64436/28500) for example. Your question seems to be specifically about a problem with running a program rather than an issue in statistics per se, so it is likely to be judged off-topic here. See [this page](https://stats.meta.stackexchange.com/q/793/28500) for links to help with respect to running programs. – EdM Feb 13 '19 at 21:30

0 Answers0