2

I have half a year long data of purchases in e-commerce site (100K purchases by 60K customers). In simplest A/B testing framework random customers got a discount on the next purchase after order completion ( disc = 0/1 below). I want to estimate to what extent disc influenced an interval between orders (diff).

Following KISS principle, I just drop all 1-time customers and regress ln(diff) on disc, but don't observe the effect at all. I have two obvious problems:

  1. data is very censored, 60% of customers appeared only once
  2. selection bias - frequent buyers had more chances to get a discount

To address (1), I turn to Cox-model coxph(Surv(diff, event) ~ disc + cluster(customer_id)) (and observe the effect!), but can't figure out whether it is the best method to handle multiple failure times(purchases) per customer. For (2), I'm thinking of introducing lagged diff, but don't know how to do it in a censored case.

There were number of relevant discussions (RFM & customer lifetime value modeling in R , Survival Analysis with Multiple Events), but I fail to find a solution for my problems. There is also BTYD package, but it's not parametric. Guess this is very standard question, but can't find a step-by-step (CrossValidated) guide

RInatM
  • 173
  • 6
  • Perhaps I'm misunderstand, but why not compare Kaplan-Meier curves of the two groups. Also, why parametric (specifically Cox)? – Cam.Davidson.Pilon Oct 09 '13 at 15:28
  • I want to use effect size to calculate the change of customer lifetime value. If our discount increases median purchase rate by large enough % we could possibly extend this campaign I thought only Cox can give me such information more or less precisely, but may be wrong – RInatM Oct 09 '13 at 15:44
  • Lots of method are listed here - http://cran.r-project.org/web/views/Survival.html . I guess my case is "Recurrent event data". Using coxph + cluster gives me the Anderson-Gill model, but I still doubt that this is my case – RInatM Oct 10 '13 at 11:57
  • If you are interested in estimating the impact of covariates on CLV you might want to have a look at the R package CLVTools. Here is tutorial on how to analyze the transaction of an apparel retailer: https://www.clvtools.com/articles/CLVTools.html – majom Nov 19 '20 at 17:31

0 Answers0