2

I have fitted both models by using lifelines python library. The results of .event_table from both models are identically matched. However, when I try to recalculate .cumulative_density_ from KaplanMeierFitter() and .cumulative_hazard_ from NelsonAalenFitter(), the results cannot be matched.

$$ P\left(die(t_i)\right) = \sum_{t\leq t_i}P_{die(t)} $$

I have used the above formula for cumulative density.

$$ \hat{H}(t) = \sum_{t_i\leq t}\frac{d_i}{t_i} $$

I have used the above formula for cumulative hazard.

I also add the data table results from the fitting and seeking for help.

enter image description here

Any suggestion on this please help. Thanks!

Avraham
  • 3,182
  • 21
  • 40

1 Answers1

2

The results agree to within 1 unit in the 2nd or 3rd significant figures, which isn't too bad and probably wouldn't be visible in plots.

I suspect that any discrepancies have to do with the handling of tied event times, which can complicate survival models that implicitly assume continuous time and thus no truly tied event times. For example, Cox survival regression models have at least 3 different ways of handling tied times, leading to slightly different apparent results. Your manual calculations evidently made no correction for ties.

A lifelines manual page describes a parameter setting that might explain some of your discrepancies:

nelson_aalen_smoothing (bool, optional) – If the event times are naturally discrete (like discrete years, minutes, etc.) then it is advisable to turn this parameter to False.

Your data do have naturally discrete time values and thus lots of tied event times, so the default setting of True might be causing part of the problem. You have to read the documentation for any survival software to see how it treats tied event times. If you just have a few distinct event times, a discrete-time survival model (essentially a set of binomial regressions) might be better.

EdM
  • 57,766
  • 7
  • 66
  • 187