I have been using the excellent R survey
package for survival analysis of complex survey data. I have the necessity to migrate to python, and have found that the Python package lifelines
gives the possibility to define sampling weights and clusters in the CoxPHFitter
. For example, reusing pieces of codes from their tutorial, I would use:
import pandas as pd
from lifelines import CoxPHFitter
df = pd.DataFrame({
'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
'weights': [1.1, 0.5, 2.0, 1.6, 1.2, 4.3, 1.4, 4.5, 3.0, 3.2, 0.4, 6.2],
'month': [1, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
'id': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]
})
cph = CoxPHFitter()
cph.fit(df, 'T', 'E', weights_col='weights', cluster_col='id', robust=True)
cph.print_summary()
to have a Cox Proportional Hazard model. Would this be equivalent of using svycoxph
?
N.B: I'd add the tag lifelines
but it does not exists and I do not have the minimum reputation (300) to create it. I'd appreciate if anybody would edit this question adding that tag.