The answer depends on whether you want (1) simply to model the duration of use of individual Products, or (2) estimate overall customer lifetime, with customers' use of various Products as predictors.
Case (1) is straightforward, if you assume that a Product is not re-activated once abandoned. I assume that time = 0
is the time that a customer acquires the product (one typically starts from time = 0
rather than time =1
), and the "event" occurs when the customer reports that the product has been abandoned. Despite your sense that Product C "is performing great in general," your Kaplan-Meier plots seem to show that Product C is abandoned much more quickly at early times than Products A or B, although things seem to even out by time = 10
.* It's hard to judge without error estimates on the curves and a sense of right-censoring (if any), so a very limited number of observations on Product C (despite the overall size of your dataset) might be playing a role.
For Case (2) you would need clear definitions of time = 0
for each customer, and for the time of the "event" of losing the customer.** Once you have those defined, then you could consider use of each of the Products as time-dependent covariates, maybe even considering combinations of Products as interaction terms. Such data are typically coded in a (startTime, stopTime, event)
format for each combination of predictor values over time, left-truncated and (potentially) right-censored for each time interval.*** This would, for example, handle Product C being released at a time after a customer enters your study, then adopted by an established customer. This also allows for a customer to re-adopt a Product after previously abandoning it, so that the current use of all Products by a customer is related to the current risk of losing that customer.
*I'm assuming that these are empirical Kaplan-Meier curves rather than extrapolations from some model. It's really dangerous to try to extrapolate beyond the survival times over which you have collected data. Also, it looks like you have discrete-time data rather than the continuous-time data appropriate for things like Weibull models, so you should consider discrete-time modeling unless you can get the actual abandonment times (between the interview times) from the customers.
**This time for loss of a customer is not always easy to define. As a friend used to ask me: "When you're popping popcorn, how do you know when the very last kernel has popped?"
*** Left truncation means you have no information prior to the startTime
of a time interval. Left censoring means you know a maximum value for an observation, just not the precise value.