How to test the difference in two proportions when the outcomes aren't binary?

Question

An e-commerce website is testing two different designs for a checkout page. Customers who visit the checkout page are randomly presented with one of the two designs.

The first metric of interest, sales uplift, can be measured by comparing the proportion of customers that finalized the sale (a binary outcome) for each of the two designs.

It is reasonably straightforward to compare these using a test for two proportions.

A second metric of interest is the dollar conversion. This is the final dollar value of the sale, as a proportion of the initial dollar value of the incoming shopping cart.

For example: A customer comes to the checkout page with \$160 worth of items in the cart (initial value). The customer removes some items from the cart and finalizes the sale for \$40 worth of items (final value). The sales conversion is 100% (we still sold the customer something), but the dollar conversion is only 25%.

How can I properly test the difference in dollar conversion for the two groups against a null hypothesis of no difference?

See below for some R code specifying the problem:

# example data
set.seed(1)
total_customers <- 1000
target_control <- rbinom(total_customers, 1, 0.5)
sale_success <- rbinom(total_customers, 1, 0.1)
initial_value <- rexp(total_customers, rate=0.1) 
final_value <- runif(total_customers, 0, 1.1) * initial_value * sale_success
sales_data <- data.frame(target_control, sale_success, initial_value, final_value)

# sales conversion - test for two proportions (two-tailed)
n1 <- sum(target_control)
n2 <- sum(!target_control)
p1 <- sum(sales_data[target_control==1,"sale_success"])/n1
p2 <- sum(sales_data[target_control==0,"sale_success"])/n2
pbar <- (p1*n1+p2*n2)/(n1+n2)
z <- (p1-p2)/sqrt(pbar*(1-pbar)/n1+pbar*(1-pbar)/n2)
pval <- 2*(1-pnorm(abs(z)))

# dollar conversion - ??
p1 <- sum(sales_data[target_control==1,"final_value"])/sum(sales_data[target_control==1,"initial_value"])
p2 <- sum(sales_data[target_control==0,"final_value"])/sum(sales_data[target_control==0,"initial_value"])

Some things to consider:

Initial value & final value are correlated
Initial value & final value both follow a long-tailed distribution, e.g. the negative exponential distribution
Sometimes final value will be greater than initial value, e.g. the customer adds more to the cart before finalizing the sale
Sale success & initial value are correlated, but I haven't specified this in the example code

Update 1: Brumar has suggested that the customer-level change in behavior, for those customers who do finalize a sale, can be compared using a Wilcoxon rank-sum test:

sales_data\$ratios=final_value/initial_value
ratios_A=sales_data\$ratios[sale_success==1 & target_control==0]
ratios_B=sales_data\$ratios[sale_success==1 & target_control==1]
wilcox.test(ratios_A,ratios_B)

I'm still interested to know if there is any way to compare the difference in the overall dollar conversion, i.e. the sum of final values over the sum of initial values?

Update 2: Solved by Brumar.

# permutation test (two-tailed)
p1 <- sum(sales_data[target_control==1 & sale_success==1,"final_value"])/sum(sales_data[target_control==1 & 
p2 <- sum(sales_data[target_control==0 & sale_success==1,"final_value"])/sum(sales_data[target_control==0 & 
yourGap<-p1-p2
L<-sales_data[,"target_control"]==1
LfilterOnlyBuyers<-sales_data[,"sale_success"]==1

nulldist <- vector(mode="numeric", length=10000)
for ( i in 1:10000) {
    Lperm <- sample(L) 
    LpermInv <- !Lperm & LfilterOnlyBuyers
    Lperm <- Lperm & LfilterOnlyBuyers

    p1_perm <- sum(sales_data[Lperm,"final_value"])/sum(sales_data[Lperm,"initial_value"])
    p2_perm <- sum(sales_data[LpermInv,"final_value"])/sum(sales_data[LpermInv,"initial_value"]    )
    nulldist[i] = p1_perm-p2_perm
}
pvalue=sum(abs(nulldist) > yourGap)/10000
alpha=0.05
ci_upper <- yourGap + quantile(nulldist, (1-alpha/2))
ci_lower <- yourGap - quantile(nulldist, (1-alpha/2))

brumar · Accepted Answer · 2015-06-25T06:39:43.270

The first part seems indeed reasonable.
Concerning the second part, I think these ratios can't be handled as proportions because they are not related to binary events, this imply that we have no clue how these ratios are distributed, which excludes z-test (well unless you have a lucky empirical normal distribution, but you mentioned to have not).

First Proposition : Wilcoxon test.
My suggestion would be to simply compare these ratios by a wilcoxon test.

>sales_data$ratios=final_value/initial_value
>ratios_A=sales_data$ratios[sale_success==1 & target_control==0]
>ratios_B=sales_data$ratios[sale_success==1 & target_control==1]
>wilcox.test(ratios_A,ratios_B)

Thinking twice on it, this test treat "1 dollar bought for 5 dollars" selected the same as "100 dollars bought for 500 dollars selected". I can understand why you want to avoid it and favor your overall ratio.

Second Proposition : Permutation test.

In order to keep your measure, what I would suggest then is to craft "your own test" with a permutation test. Under the null hypothesis being in random or treatment group makes no difference. Then, the idea is to randomly relabel which subjects are in the control or treatment group and count how many times a randomized permutation grant an equal or better gap between control and treatment group than the one you initially measure. This gap is measured by the difference of your ratio between the two groups. If you divide this number of success by the number of trials, it gives a p-value.

p1 <- sum(sales_data[target_control==1,"final_value"])/sum(sales_data[target_control==1,"initial_value"])
p2 <- sum(sales_data[target_control==0,"final_value"])/sum(sales_data[target_control==0,"initial_value"])
yourGap<-abs(p1-p2)
L<-sales_data["target_control"]==1
LfilterOnlyBuyers<-sales_data["sale_success"]==1

count=0
for ( i in 1:10000) {
  Lperm=sample(L)
  p1_perm <- sum(sales_data[Lperm,"final_value"])/sum(sales_data[Lperm & LfilterOnlyBuyers,"initial_value"])
  p2_perm <- sum(sales_data[!Lperm,"final_value"])/sum(sales_data[!Lperm & LfilterOnlyBuyers,"initial_value"])
  if (abs(p1_perm-p2_perm)>=yourGap) {
    count=count+1
  }
}
pvalue=count/10000

Use absolute values when your compute your gap if you want to two-tail test.

In this permutation test, as in the wilcoxon I suggested, I filtered out non-buyers. The idea behind it is that you don't want that the first hypothesis you tested comes into a play here. You may prefer to separate both assumptions 1)they buy less/more often 2)when they buy it's closer/farther than the initial amount selected.

Thank you for your answer. If I understand correctly, this approach separates out those customers who did finalize the sale, and asks whether there was a change in behaviour for these customers. This is a useful way to look at the problem, and something I hadn't properly considered. It would still be interesting to know, is there some way to test the difference in the overall dollar conversion? That is, the sum of finalized sales over the sum of initial values? — logworthy, Jun 24 '15 at 03:43
Thanks for your comment. I had a second look on your question and edited my answer with a new solution. I also explain why I selected buyers, even if I see you picked that up properly. — brumar, Jun 24 '15 at 06:25
Oh! Very interesting :) I made a few changes and have added the updated code to the question - 1. p1 & p2 need to filter for only buyers as well; 2. your syntax for setting L and LfilterOnlyBuyers didn't work for me; 3. filtering for buyers needs to happen after the inversion of Lperm. Also, if I understand correctly, I can store the distribution and use the quantiles of the distribution to create a conifdence interval? — logworthy, Jun 24 '15 at 23:34
Oh ! indeed, thanks for having picked up my mistakes. I have also updated my answer then. I don't think you can put the distribution of the null around your value. Maybe you mixed things up with the bootstraping procedure, which could indeed allow the computation of a confidence interval. Another note : I think that for a two-tailed test, yourGap must be an absolute value too (if it's negative you have no chance to reject the null). — brumar, Jun 25 '15 at 06:46
@brumar could you have a look here ? thx ! https://stats.stackexchange.com/questions/398436/a-b-testing-ratio-of-sums — Xavier Bourret Sicotte, Mar 20 '19 at 20:34
Hello. I think you already know what I would answer ahah. Permutations tests are flexible enough to provide you a very simple way to answer your problem. I kind of stopped contributing on crossvalidated but I'll give you a short answer. — brumar, Mar 22 '19 at 16:03

How to test the difference in two proportions when the outcomes aren't binary?

1 Answers1

Linked