1

I have a cross-sectional dataset in which, for each participant, I know their total exposure time and whether they experienced an event during their exposure time (1/0). However, I do not observe the time at which they experienced the event. So for instance, if I had a participant with 14 months of exposure time who experienced an event, I know they had that event sometime between 0 and 14 months of exposure but I don't know when.

It seems to me that if I had a bunch of participants with a particular exposure time (say, 10 months), then the proportion of those participants who did not have an event would be an estimate of the Kaplan-Meier curve at 10 months of exposure. This seems to suggest that some sort of smoothing estimate (e.g. LOESS smoothing) could estimate a Kaplan-Meier curve from my data -- the data being smoothed here would take x values of the exposure times and y-values of 1 for event-free participants and 0 for participants with events. A clear downside to smoothing is that the estimate is not guaranteed to be monotone.

Are there standard approaches that can be used to estimate a Kaplan-Meier curve with this sort of data?

josliber
  • 4,097
  • 25
  • 43
  • 1
    Search for interval censoring. A popular R package to do this is icrenreg, specifically `ic_np` function – Cam.Davidson.Pilon Aug 06 '19 at 01:35
  • @Cam.Davidson.Pilon thanks -- this is exactly what I was looking for! Care to write an answer? – josliber Aug 06 '19 at 02:10
  • 1
    This is very similar to this: https://stats.stackexchange.com/q/202348/76981 More specific than interval censoring, you are looking at *current status data*. – Cliff AB Aug 06 '19 at 03:57
  • 1
    @CliffAB thanks -- that is a great duplicate, and I have marked this as a duplicate here. – josliber Aug 06 '19 at 11:30

1 Answers1

1

This is called interval-censored data - the true (unobserved) value you are trying to estimate for each individual lies within an (observed) interval. There are a number of ways to deal with this. Knowing nothing else, you could take a multiple imputation approach in which the imputation model is simply a discrete uniform variable covering your interval. In other words, take your dataset, and for each individual, randomly generate a point in between 0 and that individual's total exposure time. Do this a number of times (say, ten times) and generate a KM curve. You can then "average" the ten Kaplan-Meier curves (i.e. take the average for each point) to get an estimate of the true KM curve. Standard errors (and a confidence band around your KM curve) can be computed too if you need those.

This approach should be able to appropriately deal with the monotonicity issue you bring up.

kenny
  • 31
  • 3
  • 2
    Unfortunately, this form of multiple imputations can easily lead to heavy bias! If the inspection time tends to occur much later than the event time, uniformly sampling between zero and inspection time will cause heavy upward bias to estimated distribution of event times. – Cliff AB Aug 06 '19 at 04:09
  • @cliffAB great point! I wrote "knowing nothing else" to justify using a uniform distribution, but in reality I agree with you that this may lead to large bias. josliber, I would use the answer linked to in the comments above. – kenny Aug 06 '19 at 19:40