So I have the following problem and I want to discuss it with you to see if I am thinking correctly.
Data description:
I have recurrent data from clinical trial. The data has a specific mtest
that is performed each year to indicate disease degree scale
. The mtest
can be ranged as 1:5
. So I have about 60,000
records that represent 10,200
patients, because some patients have more than one record (recurrent data). Data set records have the following format ID, start, stop, scale, mtest
.
Data Sample Here is a sample from the data
ID start stop mtest scale
1 7 8 1 0
2 2 3 1 0
2 3 4 1 0
2 4 5 1 0
2 5 6 1 1
2 6 7 1 0
2 7 8 1 0
3 2 3 1 0
3 3 4 1 0
3 4 5 1 1
3 5 6 2 1
3 6 7 1 0
4 1 2 1 0
4 2 3 1 0
4 3 4 1 0
4 5 6 1 0
4 6 7 1 0
4 7 8 1 0
5 2 3 1 0
5 3 4 1 0
5 4 5 1 0
may be this sample not reflect the exact nature of the data, because, as I said before, mtest
ranged from 1
to 5
.
Goal
I want to use mtest
covariant to predict the scale
for next years.
Progress
I read some articles and posts about survival analysis and according to my understanding that I can use PH
(Proportional Hazard) to specify the hazard or AFT
(Accelerated Failure Time) to predict time for event (correct me if I am wrong). So from the above understanding, I will use AFT
.
I used r
, below what I have:
Sur <- Surv(test$start, test$stop, test$scale, type='interval')
model <- survreg(Sur ~ mtest +cluster(ID) , data = test, dist = 'w'); summary(model1)
and the summary for above model as shown below
Call:
survreg(formula = Su ~ mtest + cluster(ID), data = test,
dist = "w")
Value Std. Err (Naive SE) z p
(Intercept) 2.691 0.0274 0.0223 98.3 0.00e+00
mtest -0.307 0.0139 0.0124 -22.1 4.92e-108
Log(scale) -1.320 0.0239 0.0181 -55.3 0.00e+00
Scale= 0.267
Weibull distribution
Loglik(model)= -8210.4 Loglik(intercept only)= -8474.8
Chisq= 528.78 on 1 degrees of freedom, p= 0
(Loglikelihood assumes independent observations)
Number of Newton-Raphson Iterations: 8
n= 25191
Then I used predict
function from r
as follow
pre <- predict(model)
which gave me the following results :
[1] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
[11] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
[21] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
[31] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
[41] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
[51] 10.853148 10.853148 10.853148 7.987366 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
[61] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
[71] 10.853148 10.853148 10.853148 7.987366 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
[81] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
[91] 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148 10.853148
**snipped
Questions
1) The summary: as you can see from the summary that the model produce -0.307
, so why its negative and what is the proper interpretation for it?
2) Predict function results : the result from predict
function does not make any sense. Let say that the follow-up for the patients is from 2000
to 2017
how can I predict the scale
value for year 2020
for specific patient. For example I used the following
casedat <- list(mtest=1, Id= 'Id90')
pr=1:99/100
pre <- predict(model, newdata = casedat)
which gave me
> pre
1
10.85315
so what 10.85315
means.
Many thanks in advance with appreciating to all your comments.