4

I was hoping someone could help me with this problem in the cox proportional hazards model.

I am given the following setup.

T is a non-negative random variable with continous distribution and hazard function $\lambda_T(t)$. T has density $f_T(t) = \lambda_T(t) S(t)$ and $S(t) = P(T>t)$. Also $F(t) = P(T \leq t)$ is the distribution function.

If I have $n$ observations of $T$. Note no censoring is assumed. Can anyone tell me how I arrive at an $\textbf{efficient influence function}$ for $S(t_0)$ where $t_0$ is a fixed time point.

note $\sqrt{n} ( \hat{S(t_0)} - S(t_0) ) = \sum_{i=1}^{n} \phi(T_i)$ where we have $\phi(T_i)$ is the influence function. This leads to an efficient estimator of $\hat{S}$

nalen
  • 53
  • 4
  • is there a particular text book you are working from in regards to this? – phdmba7of12 Apr 14 '21 at 18:46
  • what does $O_p$ here represent? – phdmba7of12 Apr 14 '21 at 18:51
  • $O_p$ is a term converging to zero. – nalen Apr 14 '21 at 18:54
  • No not a particular book actually. – nalen Apr 14 '21 at 18:55
  • As this is a Cox regression model, are you asking about estimating the influence of removing one event time-point on the baseline survival function? Put another way, would the individual with the event at that time point still be included at prior times (effectively censored at what was really the event time)? – EdM Apr 14 '21 at 21:54
  • @EdM sorry I believe there was an error. $t_0$ is just a fixed time point, and is in both terms of $S$. – nalen Apr 15 '21 at 11:50
  • Are you asking this in the context of a Cox model with multiple predictors? In that case, the "observation" at $T_i$ isn't just an observation of a time value but also the multi-dimensional set of associated covariates both for the case having the event and all the other cases at risk at $T_i$. Or are you asking about a simpler situation where all the information is in the event times, like with a single Kaplan-Meier curve? – EdM Apr 15 '21 at 16:06
  • @EdM Yeah the simpler situation, thank you!. I put Cox model in there, but essentially it's just a proportional hazards model without censoring. I have arrived at the Kaplan Meier estimate, but I don't know how to use the efficiency, and derive the Kaplan Meier estimate from this. I.e. I need some likelihood argument or something I think. I am to use the efficient influence function to derive the estimate, not the other way around. – nalen Apr 15 '21 at 16:16
  • @nalen even a "proportional hazards model without censoring" implies at least 1 covariate and 2 groups with different survival curves. So are you just asking about a single underlying continuous survival curve $S(t)$ estimated as $\hat S(t)$ from $n$ event times $T_i$? – EdM Apr 15 '21 at 16:35
  • @EdM Yes correct. I wasn't aware of this. – nalen Apr 15 '21 at 16:38
  • Do you think $\hat{S}(t_0)$ would be more appropriate than $\hat{S(t_0)}$? – The Pointer Apr 15 '21 at 18:13

1 Answers1

0

I'm not much of an expert on influence functions; I'll start with the working definition provided in this answer by Michael Chernick: "The influence function for a parameter...essentially measures the difference between the parameter estimate when the data point is included compared with when it is left out."

In your case you want to know how removing particular event times from the observation set (maybe more precisely, making small changes in observed event times $T_i$) affect an estimate of survival at a particular time, $\hat S(t_0)$. In your situation with a non-parametric survival function estimate,* that might be the Kaplan-Meier estimate, or the survival function derived from the Nelson-Aalen estimate of cumulative hazard. So ask yourself the following questions:

If $T_i > t_0$, is $\hat S(t_0)$ affected if you omit observation $i$ or make a (small) change in its observed time?

If $T_i = t_0$ (an event perhaps of 0 probability in principle, but maybe of some practical interest), what happens to $\hat S(t_0)$ if you omit observation $i$ or make a (small) change in its observed time?

If $T_i < t_0$, what happens to $\hat S(t_0)$ if you omit observation $i$ or make a (small) change in its observation time?

The Wikipedia entry shows the derivation of the Kaplan-Meier estimate based on maximum likelihood, which might help put the above into a more formal argument.

Although you ask in the context of no censoring, also consider what happens to $\hat S(t_0)$ if there are small changes in censoring times that aren't close to $t_0$.


*Although the question was originally posed in terms of a Cox regression, discussion in comments clarified that the question is about a non-parametric estimate of a single survival curve. A "semi-parametric" Cox regression makes no parametric assumptions about the baseline hazard, with parametric modeling of covariate effects on hazard. If the "influence function" is defined in terms of small changes in observed event times with unaltered covariate values, this type of argument can be extended to Cox models. In Cox models, however, the "influence" of interest is generally in how each of $n$ individual cases, with associated covariate values, affects estimates of each of the $p$ regression coefficients.

EdM
  • 57,766
  • 7
  • 66
  • 187