Questions tagged [ab-test]

A/B testing, also known as split or bucket testing, is a controlled comparison of the effectiveness of variants of a website, email, or other commercial product.

A/B test, or split or bucket test, is a colloquial term for a controlled experiment in which users are randomly exposed to one of several variants of a product, often a website feature.

The Response or Dependent Variable is most often count data (such as clicks on links or sales) but may be a continuous measure (like time on site). Count data is sometimes transformed to rates for analysis.

Because they create temporary variants of 'live' websites, on-line A/B tests must overcome several challenges not common in traditional experiments of human preference. For example, differential caching of test versions may degrade website performance for some versions. Users may be shown multiple variants if they return to a website and are not successfully identified with cookies or by login information. Moreover, nonhuman activity (search engine crawlers, email harvesters, and botnets) may be mistaken for human users.

Useful References:

Kohavi, Ron, Randal M. Henne, and Dan Sommerfield. "Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO." (2007).

Kohavi, Ron, et al. "Trustworthy online controlled experiments: five puzzling outcomes explained." Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2012.

319 questions
24
votes
3 answers

Safely determining sample size for A/B testing

I am a software engineer looking to build an A/B testing tool. I don't have a solid stats background but have been doing quite a bit of reading over the last few days. I am following the methodology described here and will summarize the relevant…
19
votes
4 answers

How could one develop a stopping rule in a power analysis of two independent proportions?

I am a software developer working on A/B testing systems. I don't have a solid stats background but have been picking up knowledge over the past few months. A typical test scenario involves comparing two URLs on a website. A visitor visits…
14
votes
3 answers

When is NHST appropriate in business?

Null hypothesis significance testing seems to be widely used in business. The most obvious example is A/B Testing, where a business will perform a test comparing two variants of some aspect of their business, the old and the new, and switch to the…
14
votes
2 answers

Why is it wrong to stop an A/B test before optimal sample size is reached?

I am in charge of presenting the results of A/B tests (run on website variations) at my company. We run the test for a month and then check the p-values at regular intervals until we reach significance (or abandon if significance is not reached…
13
votes
5 answers

A/B testing ratio of sums

 Context Consider the following scenario for a company selling goods online. A user can purchase several items (i.e. basket of items), some of which are of particular importance and are tracked specifically (let's call them star items). We wish to…
13
votes
3 answers

What statistical test to use for A/B test?

We have two cohorts of 1000 samples each. We measure 2 quantities on each cohort. The first one is a binary variable. The second is a real number that follows a heavy tail distribution. We want to assess which cohort performs best for each metric.…
iliasfl
  • 2,514
  • 17
  • 30
12
votes
0 answers

Understanding Sequential Probability Ratio Test (SPRT) Likelihood Ratio

I am a software developer looking to develop an alternative for the simple hypothesis testing scheme described here. In short, the test works as follows: Two URLs are compared for their ability to convert visitors. Discrete samples are captured.…
11
votes
2 answers

Validate web a/b tests by re-running an experiment - is this valid?

A webinar the other day by an a/b testing company had their resident "Data Scientist" explain that you should validate your results by re-running the experiment. The premise was, if you select 95% confidence, there is 5% (1/20) chance of a false…
11
votes
2 answers

AB test sample size calculation by hand

Evan Miller has created a well-known online AB test sample size calculator. For the sake of being able to program and modify this formula, I would like to know how to calculate sample size Evan Miller-style by hand. Personally, I'll calculate such…
zthomas.nc
  • 737
  • 2
  • 6
  • 21
11
votes
1 answer

Formula for Bayesian A/B Testing doesn't make any sense

I'm using the formula from Bayesian ab testing in order to compute results of AB test using Bayesian methodology. $$ \Pr(p_B > p_A) = \sum^{\alpha_B-1}_{i=0} \frac{B(\alpha_A+i,\beta_B+\beta_A)}{(\beta_B+i)B(1+i,\beta_B)B(\alpha_A, \beta_A)}…
10
votes
3 answers

Bayesian AB testing

I am running an AB Test on a page that receives only 5k visits per month. It would take too long to reach traffic levels necessary to measure a +-1% difference between the test and control. I have heard that I can use Bayesian stats to give me a…
Bi-Gnomial
  • 101
  • 1
  • 3
10
votes
1 answer

R - power.prop.test, prop.test, and unequal sample sizes in A/B tests

Say I want to know what sample size I need for an experiment in which I'm seeking to determine whether or not the difference in two proportions of success is statistically significant. Here is my current process: Look at historical data to…
9
votes
3 answers

Difference between G-test and t-test and which should be used for A/B testing?

The G-Test is a way to get quick estimates of a chi squared distribution, and is recommended by the author of this well-known A/B test tutorial. This tool assumes a normal distribution and uses difference of means to compute confidence. What is the…
Kevin Burke
  • 397
  • 1
  • 3
  • 12
8
votes
1 answer

How does a frequentist calculate the chance that group A beats group B regarding binary response

... (optional) within the context of Google Web Optimizer. Suppose you have two groups and a binary response variable. Now you get the following outcome: Original: 401 trials, 125 successful trials Combination16: 441 trials, 141 successful…
mlwida
  • 9,922
  • 2
  • 45
  • 74
8
votes
5 answers

Why do I need statistical power for AB testing if my results are significant?

I have been told that I need both significance and power for my AB results to be valid. I researched a lot for this and the above statement is not making sense. I get that we need high enough power to not reject the null hypothesis and assuming that…
1
2 3
21 22