How does a frequentist calculate the chance that group A beats group B regarding binary response

Question

... (optional) within the context of Google Web Optimizer.

Suppose you have two groups and a binary response variable. Now you get the following outcome:

Original: 401 trials, 125 successful trials
Combination16: 441 trials, 141 successful trials

The difference is not statistically significant, however one can calculate a probability that Combination16 will beat Original.

To calculate "Chance to beat Original" I have used an bayesian approach, i.e. performing a two-dimensional monte carlo integration over the bayesian-style confidence intervals (beta-distribution, (0,0) prior). Here is the code:

trials <- 10000
resDat<-data.frame("orig"=rbeta(trials,125+1,401-125+1),
                    "opt"=rbeta(trials,144+1,441-144+1))
length(which(resDat$opt>resDat$orig))/trials

This results in 0.6764.

Which technique would a frequentist use to calculate "Chance to beat ..." ? Maybe the power function of Fisher's exact test ?

Optional: Context of Google Web Optimizer

Google Web Optimizer is a tool for controlling multivariate Testing or A/B-Testing. This only as an introduction since this should not matter for the question itself.

The example presented above was taken from the explanation page of Google Web Optimizer (GWO), which you can find here (please scroll down to the section "Estimated Conversion Rate Ranges"), specifically from figure 2.

Here GWO delivers 67.8% for "Chance to beat Original", which slightly differs from my result. I guess Google uses a more frequentist-like approach and I wondered: What could it be ?

EDIT: Since this question was close to disappear (I guess because of its too specific nature), I have rephrased it to be of general interest.

In the frequentist viewpoint, Original either beats Combination, or it doesn't. There is no "chance" or probability involved. — charles.y.zheng, Apr 21 '11 at 17:45
@charles.y.zheng hm ... you can calculate the power of a test i.e. the probability that the Null-Hypothesis is rejected assuming the true parameters. How would you call that ? — mlwida, Apr 21 '11 at 18:34
@steffen: that is called the significance level, or $\alpha$. The power of a test is how often it rejects the null hypothesis when the alternative is true. — charles.y.zheng, Apr 22 '11 at 09:24
@charles.y.zheng I knew that ;). If you think that such a probability cannot be calculated by frequentists, why not submitting it as answer. If the community agrees, I am happy to accept it :). — mlwida, Apr 22 '11 at 09:26
@steffen: The significance level of a test is easy to obtain by calculation or simulation. The power level of a test is only defined with respect to a specific alternative. That is why it is not possible to calculate a general "power" of a test; such a notion cannot be defined. — charles.y.zheng, Apr 22 '11 at 09:34

charles.y.zheng · Accepted Answer · 2011-04-22T10:05:54.733

I will take this as an opportunity to explain some fundamental issues regarding the difference between frequentist and Bayesian statistics, by interpreting frequentist practices from a Bayesian standpoint.

In this example, we have observed data $D_1$ for the original and data $D_2$ for the combination case. One assumes that these are generated by Bernoulli random variables with parameters $p_1$ and $p_2$, respectively, and that these parameters come from the priors, $f_i(p_i)$ (with cdfs $F_i(p_i)$). The probability $p_1 > p_2$ can be calculated, as you pointed out. It is:

$$ P[p_1 > p_2;f_1,f_2] = \frac{\int_0^1 \int_0^1 I(p_1 > p_2) P[D_1|p_1] P[D_2|p_1] dF_1(p_1) dF_2(p_2)}{\int_0 ^1 \int_0^1 P[D_1|p_1] P[D_2|p_1] dF_1(p_1) dF_2(p_2) } $$

Here the Bayesian chooses priors $f_1(p_1)$ and $f_2(p_2)$ (and will usually choose the same prior for both, due to exchangeability) and proceeds with inference.

The frequentist takes a "conservative" approach when choosing a prior. The possible values of $\theta$ are assumed to be known, but the frequentist has so little confidence in their ability to assign a meaningful prior, so that they effectively look at all possible priors and then only make an inferential statement when that inferential statement is true under all possible priors. When no inference is valid under all possible priors, the frequentist remains silent.

That is the situation in this case. When one considers the priors $g_{\theta_i}(p_i)$ given by:

$$ g_{\theta_i}(p_i) = \delta (\theta_i) $$

that is, the point mass concentrated at $\theta_i$, then one can easily see that the probability desired is

$$ P[p_1 > p_2;g_{\theta_1},g_{\theta_2}] = \delta_{\theta_1, \theta_2}$$

that is, 1 when $\theta_1 = \theta_2$ and 0 otherwise.

Thus the frequentist remains silent. (Or, alternatively, makes the trivial statement: "The probability is between 0 and 1...")

Sorry, I was wrong. I finally learned (among others [here](http://stats.stackexchange.com/questions/2356/are-there-any-examples-where-bayesian-credible-intervals-are-obviously-inferior-t)), that frequentists are not even allowed to calculate confidence intervals on empirical data. Hence my follow up ideas (which I did not reveal) about how a frequentist would answer my question were all wrong, too. I am little insecure, however, since the question got 4 but your answer not a single upvote :(. — mlwida, May 03 '11 at 13:18
Now I am not comfortable with the mixing of bayesian and frequentist ideas (e.g. when you say how frequentists deal with priors (which they don't, do they ?)). Maybe the answer is simply as you put in the comments: A frequentist cannot answer the question, since it is wrong in his world outlook (as Dikran wrote [here](http://stats.stackexchange.com/questions/2356/are-there-any-examples-where-bayesian-credible-intervals-are-obviously-inferior-t/6431#6431)) Sorry again for not believing you earlier. — mlwida, May 03 '11 at 13:32
Perhaps my interpretation was not as mainstream as I believed, but there is nothing intrinsically wrong about putting frequentist and Bayesian methods on the same footing. See Lehmann and Casella's Theory of Point Estimation, in which frequentist and Bayesian methods are compared via statistical decision theory. — charles.y.zheng, May 03 '11 at 21:10

How does a frequentist calculate the chance that group A beats group B regarding binary response

1 Answers1

Linked