Hazard rates (dividing by timedelta, censored times) #1592

shilet · 2024-01-19T10:12:43Z

When I use the NelsonAalen fitter to determine the hazard rate I came across the following two issues:

in calculating the instantaneous hazard, the time between two events is not used to determine the hazard. Normally h0 = d_j/(n_j * T_j) where d_j are the number of deaths, n_j ,the number at risk, and T_j the time between t_j and t_(j+1)
the hazard rate is also determined at censored times. The hazard is then zero. To my knowledge, hazard rates are not determined at censored times.

For a simple example see:
data = {
'duration': [2, 4, 6, 7, 8],
'event': [1, 1, 0, 1, 1],
}

df = pd.DataFrame(data)
naf = NelsonAalenFitter()
naf.fit(df['duration'], df['event'])
naf.plot_hazard(bandwidth=1, ci_show=False, label='NA hazard lifelines')

The same problem in the CoxPH fitter. For determining the baseline_hazard_ dividing by delta time is not done, and the hazard is also determined at censored times.

I was also wondering why the calculation of the hazard rate based on the Kaplan Meier event table is not implemented as this seems to me the most straightforward implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hazard rates (dividing by timedelta, censored times) #1592

Hazard rates (dividing by timedelta, censored times) #1592

shilet commented Jan 19, 2024

Hazard rates (dividing by timedelta, censored times) #1592

Hazard rates (dividing by timedelta, censored times) #1592

Comments

shilet commented Jan 19, 2024