Questions tagged [ab-test]

A/B testing, also known as split or bucket testing, is a controlled comparison of the effectiveness of variants of a website, email, or other commercial product.

A/B test, or split or bucket test, is a colloquial term for a controlled experiment in which users are randomly exposed to one of several variants of a product, often a website feature.

The Response or Dependent Variable is most often count data (such as clicks on links or sales) but may be a continuous measure (like time on site). Count data is sometimes transformed to rates for analysis.

Because they create temporary variants of 'live' websites, on-line A/B tests must overcome several challenges not common in traditional experiments of human preference. For example, differential caching of test versions may degrade website performance for some versions. Users may be shown multiple variants if they return to a website and are not successfully identified with cookies or by login information. Moreover, nonhuman activity (search engine crawlers, email harvesters, and botnets) may be mistaken for human users.

Useful References:

Kohavi, Ron, Randal M. Henne, and Dan Sommerfield. "Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO." (2007).

Kohavi, Ron, et al. "Trustworthy online controlled experiments: five puzzling outcomes explained." Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2012.

319 questions

votes

3 answers

Safely determining sample size for A/B testing

I am a software engineer looking to build an A/B testing tool. I don't have a solid stats background but have been doing quite a bit of reading over the last few days. I am following the methodology described here and will summarize the relevant…

asked Oct 05 '12 at 13:50

jkndrkn

votes

4 answers

How could one develop a stopping rule in a power analysis of two independent proportions?

I am a software developer working on A/B testing systems. I don't have a solid stats background but have been picking up knowledge over the past few months. A typical test scenario involves comparing two URLs on a website. A visitor visits…

statistical-power ab-test sequential-analysis optimal-stopping

asked Apr 25 '13 at 21:02

jkndrkn

votes

3 answers

When is NHST appropriate in business?

Null hypothesis significance testing seems to be widely used in business. The most obvious example is A/B Testing, where a business will perform a test comparing two variants of some aspect of their business, the old and the new, and switch to the…

hypothesis-testing ab-test business-intelligence

asked Dec 02 '19 at 18:43

Vincent B. Lortie

votes

2 answers

Why is it wrong to stop an A/B test before optimal sample size is reached?

I am in charge of presenting the results of A/B tests (run on website variations) at my company. We run the test for a month and then check the p-values at regular intervals until we reach significance (or abandon if significance is not reached…

hypothesis-testing statistical-significance bias ab-test optimal-stopping

asked Nov 07 '16 at 11:40

sgk

votes

5 answers

A/B testing ratio of sums

Context Consider the following scenario for a company selling goods online. A user can purchase several items (i.e. basket of items), some of which are of particular importance and are tracked specifically (let's call them star items). We wish to…

hypothesis-testing statistical-significance ab-test

asked Mar 20 '19 at 01:24

Xavier Bourret Sicotte

7,986
3
40
72

votes

3 answers

What statistical test to use for A/B test?

We have two cohorts of 1000 samples each. We measure 2 quantities on each cohort. The first one is a binary variable. The second is a real number that follows a heavy tail distribution. We want to assess which cohort performs best for each metric.…

ab-test

asked Jun 10 '14 at 23:12

iliasfl

2,514
17
30

votes

0 answers

Understanding Sequential Probability Ratio Test (SPRT) Likelihood Ratio

I am a software developer looking to develop an alternative for the simple hypothesis testing scheme described here. In short, the test works as follows: Two URLs are compared for their ability to convert visitors. Discrete samples are captured.…

hypothesis-testing likelihood-ratio ab-test sequential-analysis optimal-stopping

asked Apr 29 '13 at 15:43

jkndrkn

votes

2 answers

Validate web a/b tests by re-running an experiment - is this valid?

A webinar the other day by an a/b testing company had their resident "Data Scientist" explain that you should validate your results by re-running the experiment. The premise was, if you select 95% confidence, there is 5% (1/20) chance of a false…

probability hypothesis-testing statistical-significance ab-test

asked Mar 19 '14 at 18:19

John

votes

2 answers

AB test sample size calculation by hand

Evan Miller has created a well-known online AB test sample size calculator. For the sake of being able to program and modify this formula, I would like to know how to calculate sample size Evan Miller-style by hand. Personally, I'll calculate such…

experiment-design statistical-power ab-test

asked Feb 17 '19 at 19:58

zthomas.nc

votes

1 answer

Formula for Bayesian A/B Testing doesn't make any sense

I'm using the formula from Bayesian ab testing in order to compute results of AB test using Bayesian methodology. $$ \Pr(p_B > p_A) = \sum^{\alpha_B-1}_{i=0} \frac{B(\alpha_A+i,\beta_B+\beta_A)}{(\beta_B+i)B(1+i,\beta_B)B(\alpha_A, \beta_A)}…

r bayesian ab-test

asked Mar 10 '15 at 13:43

Yehoshaphat Schellekens

votes

3 answers

Bayesian AB testing

I am running an AB Test on a page that receives only 5k visits per month. It would take too long to reach traffic levels necessary to measure a +-1% difference between the test and control. I have heard that I can use Bayesian stats to give me a…

bayesian hypothesis-testing ab-test

asked Sep 01 '11 at 00:11

Bi-Gnomial

votes

1 answer

R - power.prop.test, prop.test, and unequal sample sizes in A/B tests

Say I want to know what sample size I need for an experiment in which I'm seeking to determine whether or not the difference in two proportions of success is statistically significant. Here is my current process: Look at historical data to…

r hypothesis-testing statistical-significance proportion ab-test

asked Jul 16 '14 at 22:17

userNaN

votes

3 answers

Difference between G-test and t-test and which should be used for A/B testing?

The G-Test is a way to get quick estimates of a chi squared distribution, and is recommended by the author of this well-known A/B test tutorial. This tool assumes a normal distribution and uses difference of means to compute confidence. What is the…

confidence-interval ab-test

asked Mar 24 '12 at 21:38

Kevin Burke

votes

1 answer

How does a frequentist calculate the chance that group A beats group B regarding binary response

... (optional) within the context of Google Web Optimizer. Suppose you have two groups and a binary response variable. Now you get the following outcome: Original: 401 trials, 125 successful trials Combination16: 441 trials, 141 successful…

bayesian ab-test

asked Apr 19 '11 at 09:53

mlwida

9,922
2
45
74

votes

5 answers

Why do I need statistical power for AB testing if my results are significant?

I have been told that I need both significance and power for my AB results to be valid. I researched a lot for this and the above statement is not making sense. I get that we need high enough power to not reject the null hypothesis and assuming that…

statistical-significance t-test statistical-power ab-test

asked May 08 '21 at 11:01

Rohan

2 3

…

21 22 Next