0

I have a revenue measure over 2 distinct groups of users (group A and B in an A/B test), but the revenue distribution across users is definitely not Normal. In order to measure a random variable that is normally distributed I bootstrapped the dataset N times for each group separately and now I am running a Wald test for comparing two means (mean of ARPU - avg revenue per user) for each bootstrapped dataset, and with a large enough N I end up with a Normal distribution. Is this the right way to proceed? Should I be using the variance of this new random variable (collection of ARPUs)?

RafaJM
  • 101
  • no.. see http://blog.analytics-toolkit.com/2017/statistical-significance-non-binomial-metrics-revenue-time-site-pages-session-aov-rpu/ assuming you have sufficient users in each group the ARPU should be normally distributed by the central limit theorem. use your bootstrap to determine the distribution of the sample mean in your two groups (ie N=sample size)... assuming its big enough you can just use a z test (I assume we are talking about at least 100s of users) – seanv507 Feb 17 '21 at 20:53
  • 1
    Wouldn't mix resampling with asymptotic methods when there is no need. You could more simply run a permutation/randomization test, which makes a whole lot of sense here and makes no assumptions w.r.t. distributions e.g. code here: https://github.com/stefgehrig/perm_test_ci – stefgehrig Feb 17 '21 at 20:57
  • The thing is that I don't see how adding users makes ARPU a normally distributed random variable. The vast majority of users in my case generate 0 revenue or very little revenue, so the distribution is at best a very "aggressive" form of exponential. – RafaJM Feb 17 '21 at 23:09

0 Answers0