I want to evaluate my Cox model using the lifelines package for a time varying covariate problem. However, when I use the lifelines.CoxTimeVaryingFitter
I get a convergence error. I have already transformed my dataset into long_format
, moreover what's perplexing is that my data is pretty large. My dataset consists mostly of continuous numeric value (and no categorical variables, except one binary variable).
How I am calling the lifelines function:
cph = lifelines.CoxTimeVaryingFitter()
cph.fit(train, id_col="loan_number",start_col="start", stop_col="stop", event_col='default_flag', show_progress=True,step_size=0.5)
cph.print_summary()
My dataset is of size: (11293449,13). The datatypes of the covariates that I'm using are:
default_flag int32
co_flag uint8
Orig_CLTV float64
orig_FICO float64
orig_DTI float64
orig_UPB float64
HPI object
State_unemp float64
Mortgage_Rate float64
orig_coupon float64
orig_term float64
loan_number int64
start int64
stop int64
Below is a sample of the dataset when I run head
method on the dataframe:
Any help is much appreciated.
Python Error Message: ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high colinearity. Please see the following tips in the lifelines documentation:
EDIT: The original dataset prior to the reformat through the to_long_format
function is as below:
Code for transforming the dataset is: y = to_long_format(x, duration_col = 'age')