To put in context, I'm using a Cox PH model in the area of Survival Analysis using lifelines
package to predict when a customer will do something, if that even happens.
The only metrics included in lifelines
is the concordance index or log-likelihood.
As far as I know, the concordance index would be the equivalent of a ranked correlation for censored data, or a ROC AUC (I've seen both interpretations).
I wanted to know if there is any metric that would be useful to measure both accuracy (in therms of the event ever happening or not) and "precision" (in the sense of, calculating the deviation of the mean for the well-predicted classes).
The output of my model is the median prediction in days for the event to happen.
Does such a performance metric exist?
I thought maybe checking the concordance index by groups (let's say, those who cured by 10 day, between day 10 and 20, day 20 and 30, etc) to make it a bit more precise... but I'm not sure that's the way to go. Maybe even the MAE in conjunction with the accuracy to get a better picture?
Here is a bit of code so you get the idea:
### Between 20 and 30 days ###
test_20and30 = imputed_df.loc[(imputed_df['length_of_arrears']>20) & (imputed_df['length_of_arrears']<=30)]
cph.predict_median(test_20and30)
Out[480]:
17 8.0
30 15.0
40 27.0
49 11.0
55 6.0
67 423.0
88 11.0
126 20.0
146 7.0
148 6.0
150 11.0
169 14.0
186 8.0
190 12.0
204 10.0
215 28.0
242 15.0
282 15.0
287 7.0
299 9.0
308 14.0
325 9.0
357 98.0
364 21.0
Name: 0.5, dtype: float64
test_20and30.cured
Out[481]:
17 1.0
30 0.0
40 0.0
49 0.0
55 0.0
67 0.0
88 1.0
126 1.0
146 0.0
148 1.0
150 0.0
169 1.0
186 0.0
190 1.0
204 1.0
215 0.0
242 1.0
282 0.0
287 0.0
299 0.0
308 1.0
325 0.0
357 0.0
364 1.0
Name: cured, dtype: float64
test_20and30.length_of_arrears
Out[482]:
17 22.0
30 21.0
40 28.0
49 26.0
55 24.0
67 28.0
88 21.0
126 27.0
146 26.0
148 24.0
150 27.0
169 23.0
186 26.0
190 22.0
204 26.0
215 26.0
242 30.0
282 23.0
287 25.0
299 27.0
308 22.0
325 26.0
357 27.0
364 27.0
Name: length_of_arrears, dtype: float64
# concordance index
cph.score(test_20and30, scoring_method='concordance_index')
Out[484]: 0.60431654676259
cured_ornot = pd.DataFrame(index=cph.predict_median(test_20and30).index)
cured_ornot['cured']=0
cured_ornot.loc[((cph.predict_median(test_20and30) > 20) & (cph.predict_median(test_20and30) <= 30 )),'cured']=1
print(classification_report(test_20and30.cured, cured_ornot))
precision recall f1-score support
0.0 0.57 0.86 0.69 14
1.0 0.33 0.10 0.15 10
accuracy 0.54 24
macro avg 0.45 0.48 0.42 24
weighted avg 0.47 0.54 0.46 24