Let's consider an e-commerce site. We have an AB test for which we want to measure if the average revenue from treatment A is statistically significant different from B. i.e. the main goal is to determine if A is statistically significant better than B (or vice versa).
Both A, B variants generate a continuous stream of revenues for each purchased item. For most of the items the revenue is zero. Our measure is the average revenue per displayed item. i.e. The raw data is composed of many items with their associated revenue if purchased or zero otherwise.
The generated revenue has some patterns. For example, the sport category generates much more revenue than other categories. Another example is the day of week pattern in which we have much less revenue on weekends.
In order to reduce the variance I would like to account for the revenue patterns. One option is to aggregate the data. For example, calculate the avg revenue per category, calculate the difference in revenue between A & B for each category, and then do a paired t-test on all the revenue differences. Or aggregate the results per day and then send to t-test the differences in revenue for each day.
The aggregation is done only to decrease variance, the research question is still, which treatment will generate statistically significant more revenue globally (i.e. considering all days and all categories)
My questions:
Suppose that I do aggregation per day
Does it make sense to aggregate and do paired t-test ? One of the downsides of this approach is that we lose the information about the standard deviation of each day
If the different aggregations have different sizes (e.g. each day has different revenue) should i weigh it somehow (i.e. give higher revenue days more weight) and if so how can i weigh in paired t-test?
Any other ideas of how to do it?
This question suggest that pairing is a good idea, but there are some crucial differences. for example, the data in this problem is already aggregated, and there is no solution for the different sizes of the aggregation.