Here's the thing, I was planning to test a new strategy on my website. A typical AB test for a new homepage. One strategy is the current, the control; the other is the variant, the new strategy. I have a conversion rate of 30%. I'm using a sample size calculator which tells me that my sample size should be 1,335 for each variation. I can get this number of accesses on three days of test. However, I've seen some "AB testing best practices" telling me my test should run for at least a week, due to seasonalities. What should prevail? The number of days (at least one week) or the sample size calculated?
Asked
Active
Viewed 76 times
5

kjetil b halvorsen
- 63,378
- 26
- 142
- 467

dsbr__0
- 707
- 2
- 7
-
Before I answer on that question, please tell me what the AB splittests is for, in detail, please tell me about the product, then we could derive if there may be even seasonality – Patrick Bormann Mar 16 '21 at 21:26
-
2The sample size obtained from a 'power and sample size' procedure is not a recommendation for the maximum sample size you should use, but a recommendation for the minimum necessary to achieve a certain power. – BruceET Mar 16 '21 at 22:15
-
@PatrickBormann even though might not be seasonality, let's treat like there is. This is the point to me, to learn how to deal with it, not only on how to solve this specific problem – dsbr__0 Mar 17 '21 at 10:38
-
@BruceET so, after the minimum number of datapoints in each sample, my results should be more stable, am I right? I mean, if I have 2,000 of datapoints or, 3,000, or 10,000, the result should be the same. Because that's the thing I'm worried about. – dsbr__0 Mar 17 '21 at 10:39
-
If you have the time to wait for a week, what is the problem? – kjetil b halvorsen Mar 17 '21 at 10:57
-
@kjetilbhalvorsen so, I'm a bit of a layman here... but I was reading that you should run the test for the exact number of datapoints you calculated. I also read that you shouldn't stop the test in a different moment because that would interfere in the result. Well, idk.. that's the thing, idk lol – dsbr__0 Mar 17 '21 at 11:28
-
1For a correct hypothesis test, yes, you should follow the original data collection and analysis plan (or look into more advanced methods). But if you plan to run your test for a week, irrespective of exact number of points in that week, that is enough. And your calculated sample size is a minimum (for a given power) only. There is no problem in planning for more ... but probably you are following the conversion rate routinely, on a daily basis, and plotting it? Then you will know if there is weakly seasonality! – kjetil b halvorsen Mar 17 '21 at 12:13
-
Another point. You are probably following conversion rate on a routine basis, then you should look into [tag:control-chart] and set up such! – kjetil b halvorsen Mar 22 '21 at 15:32
1 Answers
1
The sample size of the ab test is based on the number of data points, or randomization units, not the number of experiment days.
The recommendation that running experiments for a minimum of one week is about external validity reasons: you may have a different population of users on weekends or weekdays, even the same user may behave differently. If you only run an experiment on weekends, you can't directly generalize experimental effects to weekdays.
Reference: Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing, chapter 2

xiaoA
- 61
- 1
- 5