Prediction with accelerated failure time in r for clinical data

Question

So I have the following problem and I want to discuss it with you to see if I am thinking correctly.

Data description:

I have recurrent data from clinical trial. The data has a specific mtest that is performed each year to indicate disease degree scale. The mtest can be ranged as 1:5. So I have about 60,000 records that represent 10,200 patients, because some patients have more than one record (recurrent data). Data set records have the following format ID, start, stop, scale, mtest.

Data Sample Here is a sample from the data

ID start stop mtest scale
1     7    8      1   0
2     2    3      1   0
2     3    4      1   0
2     4    5      1   0
2     5    6      1   1
2     6    7      1   0
2     7    8      1   0
3     2    3      1   0
3     3    4      1   0
3     4    5      1   1
3     5    6      2   1
3     6    7      1   0
4     1    2      1   0
4     2    3      1   0
4     3    4      1   0
4     5    6      1   0
4     6    7      1   0
4     7    8      1   0
5     2    3      1   0
5     3    4      1   0
5     4    5      1   0

may be this sample not reflect the exact nature of the data, because, as I said before, mtest ranged from 1 to 5.

Goal

I want to use mtest covariant to predict the scale for next years.

Progress

I read some articles and posts about survival analysis and according to my understanding that I can use PH (Proportional Hazard) to specify the hazard or AFT (Accelerated Failure Time) to predict time for event (correct me if I am wrong). So from the above understanding, I will use AFT.

I used r, below what I have:

Sur <- Surv(test$start, test$stop, test$scale, type='interval')
model <- survreg(Sur ~ mtest +cluster(ID) , data = test, dist =      'w'); summary(model1)

and the summary for above model as shown below

Call:
 survreg(formula = Su ~ mtest + cluster(ID), data = test, 
    dist = "w")
             Value Std. Err (Naive SE)     z         p
 (Intercept)  2.691   0.0274     0.0223  98.3  0.00e+00
 mtest       -0.307   0.0139     0.0124 -22.1 4.92e-108
 Log(scale)  -1.320   0.0239     0.0181 -55.3  0.00e+00

 Scale= 0.267 

 Weibull distribution
 Loglik(model)= -8210.4   Loglik(intercept only)= -8474.8
     Chisq= 528.78 on 1 degrees of freedom, p= 0 
 (Loglikelihood assumes independent observations)
 Number of Newton-Raphson Iterations: 8 
 n= 25191

Then I used predict function from r as follow

pre <- predict(model)

which gave me the following results :

 [1] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
 [11] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
 [21] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
 [31] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
 [41] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
 [51] 10.853148 10.853148 10.853148  7.987366 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
 [61] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
 [71] 10.853148 10.853148 10.853148  7.987366 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
 [81] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
 [91] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
**snipped

Questions

1) The summary: as you can see from the summary that the model produce -0.307, so why its negative and what is the proper interpretation for it?

2) Predict function results : the result from predict function does not make any sense. Let say that the follow-up for the patients is from 2000 to 2017 how can I predict the scale value for year 2020 for specific patient. For example I used the following casedat <- list(mtest=1, Id= 'Id90') pr=1:99/100 pre <- predict(model, newdata = casedat)

which gave me

> pre
        1 
 10.85315

so what 10.85315 means.

Many thanks in advance with appreciating to all your comments.

where is your scale variable? – Deep North Oct 02 '17 at 12:19 — Deep North, Oct 02 '17 at 12:19
Sorry @DeepNorth its an `event`, I will edit this too – Abdal Oct 02 '17 at 13:57 — Abdal, Oct 02 '17 at 13:57

Deep North · Answer 1 · 2017-10-03T01:26:26.630

I think if you can give more detail of your dataset there are a lot of people in this site can help you.

You say "The data has a specific mtest that is performed each year to indicate disease degree scale" and from your code below:

 Surv(test$start, test$stop, test$scale, type='interval')

The above code shows that $scale$ should be the indicator variable to indentify whether the subect had an event or censored.i.e sacle should be a dummy variable with value $0$ or $1$. You can type $?Surv$ in R to check. Also you have a interval censor type data here.

Surv(time, time2, event,
type=c('right', 'left', 'interval', 'counting', 'interval2', 'mstate'), origin=0)

Your $Goal$ is "use mtest covariant to predict the $scale$ for next years" i.e to predict yes or not (for the scale variable).

I hope I understand you well.

However, what you did is to predict survival time of a subject, not the "event" of the subject.

Since I don't have your dataset, let us use Stanford heart transplant data to show how to do the prediction.

 heart$start<-heart$start+0.1 #for weibull can not run with zero I add 0.1

 Sur<-Surv(heart$start, heart$stop, heart$event,type="interval")

 model<- survreg(Sur ~ age+year,data= heart,dist="w") 

 summary(model)

The results are:

 Call:
survreg(formula = Sur ~ age + year, data = heart, dist = "w")
             Value  Std. Error     z        p
(Intercept) 1.53398     0.4807 3.191 1.42e-03
age         0.00701     0.0304 0.231 8.17e-01
year        0.56026     0.1454 3.854 1.16e-04
Log(scale)  0.75931     0.0846 8.975 2.84e-19

Scale= 2.14 

 Weibull distribution
 Loglik(model)= -272.6   Loglik(intercept only)= -280.6
    Chisq= 15.85 on 2 degrees of freedom, p= 0.00036 
 Number of Newton-Raphson Iterations: 7 
 n= 172

Now let use predict

   pre <- predict(model)

The results are

 [1]   4.404946   5.493280   5.623295   5.623295   5.779412   5.779412   
5.385467   7.191498
  [9]   7.324740   7.324740   7.266231   7.450016   7.232708   7.232708   
7.562176   7.562176
 [17]   8.250926   8.356734   8.356734   8.336692   8.336692   8.415050   
        8.532549   8.532549
[25]   6.978359   9.070038   9.070038   9.459353  10.283023  10.283023   
     9.501070   9.501070
 [33]  10.140262  10.140262  11.734231  11.734231  11.450256  11.450256  
  ....

These are just 172 predicted times for each sujbect.

Let check subject 1's survival time, we use his age =-17.15537303, and year=0.12320329 to predict his/her survival time.

  p1<-predict(model, newdata=data.frame(year=0.12320329,age=-17.15537303),type="response")
  p1

Note we specified type="response" here.

The result is

   > p1
   1 
  4.404946

Which is exactly the survival time.

Now you can see for your data, what you predicted is survival time,i.e. the "response" not the event.

Also, I think you can not predict an event, but you can predict a probability of an envent during a period of time. I think you need specify type='qunatile'.

I think if you have a right censored data the prediction seems much easier to understand. I am even not sure what the prediction is for the interval censored data.

The meaning of coefficient of -0.307 can be seen from the weibull model:

$log(T)=\beta_0+\beta_1x+\sigma\epsilon$ suppose your mtest is continuous varialbe.

$log(T_1)=\beta_0-0.307(x+1)+\sigma\epsilon \\log(T_0)=\beta_0-0.307x+\sigma\epsilon\\\Rightarrow log(T_1)-log(T_0)=-0.307$

This means when x increase by one unit the survival time decrase -0.307 days on log scale (suppose your time is measured by days)

You also can see $log(\frac{T_1}{T_0})=-0.307$ this also shows that when mtest increase by 1 unit the survival time will be shorter.

The explanation has a opposite direction with the Cox model.

Thank you @DeepNorth. I did add a sample from my data, hope this can make things more clear. Now what `4.404946 ` means? is it mean "that we expect that patient p1 will survive for **4.404946** unit of time", and what is the meaning of the negative sign `-0.307` that shown in the summary in my code — Abdal, Oct 02 '17 at 13:53

Prediction with accelerated failure time in r for clinical data

1 Answers1