1

I want to evaluate my Cox model using the lifelines package for a time varying covariate problem. However, when I use the lifelines.CoxTimeVaryingFitter I get a convergence error. I have already transformed my dataset into long_format, moreover what's perplexing is that my data is pretty large. My dataset consists mostly of continuous numeric value (and no categorical variables, except one binary variable).

How I am calling the lifelines function:

cph = lifelines.CoxTimeVaryingFitter()
cph.fit(train, id_col="loan_number",start_col="start", stop_col="stop", event_col='default_flag', show_progress=True,step_size=0.5)
cph.print_summary()

My dataset is of size: (11293449,13). The datatypes of the covariates that I'm using are:

default_flag       int32
co_flag            uint8
Orig_CLTV        float64
orig_FICO        float64
orig_DTI         float64
orig_UPB         float64
HPI               object
State_unemp      float64
Mortgage_Rate    float64
orig_coupon      float64
orig_term        float64
loan_number        int64
start              int64
stop               int64

Below is a sample of the dataset when I run head method on the dataframe: enter image description here

Any help is much appreciated.

Python Error Message: ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high colinearity. Please see the following tips in the lifelines documentation:

EDIT: The original dataset prior to the reformat through the to_long_format function is as below: Code for transforming the dataset is: y = to_long_format(x, duration_col = 'age') enter image description here

Josh
  • 137
  • 1
  • 1
  • 10

1 Answers1

2

a few thoughts:

  • something looks wrong with the start column - it should be different for each row for a subject (like stop is).
  • look for constant columns. One quick and dirty check is to start dropping columns and see if it converges.
Cam.Davidson.Pilon
  • 11,476
  • 5
  • 47
  • 75
  • Thank you, I believe the problem is lies in when I'm converting `long_format`. In your opinion, would a dataset which looks identical to the above (except for start, stop columns) cause an issue? The `duration_col` is "age" which is essentially an enumerated column (i.e. starts at 6 and goes till n) and increments for each row. – Josh Aug 19 '20 at 19:00
  • I think that should be okay, but it's hard for me to reason without seeing the original dataset and code – Cam.Davidson.Pilon Aug 19 '20 at 19:05
  • I have provided a second image of what the original dataset looks like. The transformation call is very simple (as mentioned above), I can't seem to get the `start` column to enumerate from 0 to changing every row, it stays at 0 (as shown in the first image) – Josh Aug 19 '20 at 19:20
  • ah, your dataset already is in the long format :) You'll need to do some slight pandas work to create a "start" and "stop" columns (hint: something like `df['start'] = df.groupby("load_number")['age'].shift(1)` (not tested, but something like that) – Cam.Davidson.Pilon Aug 19 '20 at 22:33
  • Got it thank you, I just looked at 'rossi` dataset and it looks like as soon as the event happens (in my case a default = 1), the id sequence ends. `start` is always an increasing function with rows, and in the case of an event, the sequence is just cut short. Just looking to confirm logic. – Josh Aug 19 '20 at 23:01
  • 1
    that sounds right, and "stop" is start + 1 (if I understand correctly) – Cam.Davidson.Pilon Aug 19 '20 at 23:32
  • One question - I want to debug the `hazard rate` that is produced for a particular variable. In my code, I'm seeing an odd behavior from reality, i.e. a covariate is behaving exact opposite of how it should behave in real life. When I sort the covariate from lowest to highest, its respective `hazard rate` decreases and I'm expecting it to increase. Calculation: `np.exp((x-mean)*coef)`. Higher this hazard rate, higher the chance of an event occuring. – Josh Aug 27 '20 at 11:30
  • could it be do to confounding, observed or unobserved? Also see the "table 2 fallacy" – Cam.Davidson.Pilon Aug 27 '20 at 14:56