63

I have three groups of data, each with a binomial distribution (i.e. each group has elements that are either success or failure). I do not have a predicted probability of success, but instead can only rely on the success rate of each as an approximation for the true success rate. I have only found this question, which is close but does not seem to exactly deal with the this scenario.

To simplify down the test, let's just say that I have 2 groups (3 can be extended from this base case).

Group Trials $n_i$ Successes $k_i$ Percentage $p_i$
Group 1 2455 1556 63.4%
Group 2 2730 1671 61.2%

I don't have an expected success probability, only what I know from the samples.

The success rate of each of the sample is fairly close. However my sample sizes are also quite large. If I check the CDF of the binomial distribution to see how different it is from the first (where I'm assuming the first is the null test) I get a very small probability that the second could be achieved.

In Excel:

1-BINOM.DIST(1556,2455,61.2%,TRUE) = 0.012

However, this does not take into account any variance of the first result, it just assumes the first result is the test probability.

Is there a better way to test if these two samples of data are actually statistically different from one another?

Sycorax
  • 76,417
  • 20
  • 189
  • 313
Scott
  • 900
  • 1
  • 8
  • 12
  • Another question I came across that didn't really help much: http://stats.stackexchange.com/questions/82059/determining-statistical-significance-of-difference-between-two-binomial-distribu – Scott Aug 28 '14 at 17:15
  • Does this question help? http://stats.stackexchange.com/questions/25299/comparing-two-binary-variables-of-unequal-sizes – Eric Aug 28 '14 at 17:32
  • 3
    In R, you could use `prop.test`: `prop.test(c(1556, 1671), c(2455, 2730))`. – COOLSerdash Aug 28 '14 at 17:48
  • @COOLSerdash you might want to give the answer in excel since that is what they seem to be using. – Dan Aug 28 '14 at 17:50
  • @Scott here is a link on how to perform a Chi Square test in excel: http://office.microsoft.com/en-us/excel-help/chisq-test-function-HP010335674.aspx – Dan Aug 28 '14 at 17:51
  • 2
    Could be done as a two-sample (binomial) proportions test, or a 2x2 chi-square – Glen_b Aug 28 '14 at 17:58
  • 4
    Extending the base case from two groups to three could be problematic, because the tests will be interdependent: you will need a binomial version of ANOVA to handle that. – whuber Mar 21 '16 at 01:42
  • @whuber, I was worrying the same thing. I did build this out so I tested the 3!/(2!*1!) combinations against each other. I would really like to test ALL 3 against each other in aggregate. Could you add an answer that would address the three group situation. The selected answer does point out that this doesn't generalize to 3 groups, but he does suggest a solution. – Scott Mar 21 '16 at 18:46
  • I haven't the time, but I would direct your attention to logistic regression (with three planned contrasts). – whuber Mar 21 '16 at 18:48

6 Answers6

53

The solution is a simple google away: http://en.wikipedia.org/wiki/Statistical_hypothesis_testing

So you would like to test the following null hypothesis against the given alternative

$H_0:p_1=p_2$ versus $H_A:p_1\neq p_2$

So you just need to calculate the test statistic which is

$$z=\frac{\hat p_1-\hat p_2}{\sqrt{\hat p(1-\hat p)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}$$

where $\hat p=\frac{n_1\hat p_1+n_2\hat p_2}{n_1+n_2}$.

So now, in your problem, $\hat p_1=.634$, $\hat p_2=.612$, $n_1=2455$ and $n_2=2730.$

Once you calculate the test statistic, you just need to calculate the corresponding critical region value to compare your test statistic too. For example, if you are testing this hypothesis at the 95% confidence level then you need to compare the absolute value of your test statistic against the critical region value of $z_{\alpha/2}=1.96$ (for this two tailed test).

Now, if $|z|>z_{\alpha/2}$ then you may reject the null hypothesis, otherwise you must fail to reject the null hypothesis.

Well this solution works for the case when you are comparing two groups, but it does not generalize to the case where you want to compare 3 groups.

You could however use a Chi Squared test to test if all three groups have equal proportions as suggested by @Eric in his comment above: " Does this question help? stats.stackexchange.com/questions/25299/ … – Eric"

Dan
  • 964
  • 6
  • 15
  • 8
    Thanks @Dan. As many times with Google, knowing the right term to search for is the first hurdle. I did take a look at the chi-squared test. The problem there, as with where I was first getting stuck, is that my expected calculation is based on the sample. I can't therefore provide an expected value, because my samples are used to determine that expected value. – Scott Aug 28 '14 at 18:33
  • @Scott, if your hypothesized proportions for the three groups are that they are all equal then the expected value should be 1/3 for each group. – Dan Aug 28 '14 at 18:35
  • 2
    A related explanation of using this test can be found here: http://www.itl.nist.gov/div898/handbook/prc/section3/prc33.htm (currently, the Wikipedia page does not provide a walk-through example). – wwwilliam Apr 18 '17 at 03:29
  • 2
    Can someone help me prove the standard deviation of the difference between the two binomial distributions, in other words prove that : $$\sqrt{\hat p (1-\hat p)(\frac{1}{n_1} + \frac{1}{n_2})} = \sqrt{\frac{\hat p_1 (1-\hat p_1)}{n_1} + \frac{\hat p_2 (1-\hat p_2)}{n_2}}$$ – Tanguy Aug 04 '18 at 09:36
  • 1
    answer to my question can be found here: https://stats.stackexchange.com/questions/361015/proof-of-the-standard-error-of-the-distribution-between-two-normal-distributions/361048#361048 – Tanguy Aug 08 '18 at 10:22
  • FYI, this test can be described as a "two-tailed two-proportion pooled z-test". The calculation is described in detail here: https://stattrek.com/hypothesis-test/difference-in-proportions.aspx – user2739472 Sep 13 '20 at 15:01
16

In R the answer is calculated as:

fisher.test(rbind(c(1556,2455-1556), c(1671,2730-1671)), alternative="less")
David Makovoz
  • 369
  • 2
  • 8
  • 16
    Would you consider writing a little bit more than providing the R function? Naming the function does not help in understanding the problem and not everyone use R, so it would be no help for them. – Tim Dec 08 '14 at 13:52
  • 1
    This is the most exact statistical answer, and works for small numbers of observations (see the following: http://www.itl.nist.gov/div898/handbook/prc/section3/prc33.htm). – Andrew Mao Mar 30 '15 at 19:36
  • 1
    Fishers exact test http://en.wikipedia.org/wiki/Fisher's_exact_test – Keith May 22 '15 at 23:59
8

Just a summary:

Dan and Abaumann's answers suggest testing under a binomial model where the null hypothesis is a unified single binomial model with its mean estimated from the empirical data. Their answers are correct in theory but they need approximation using normal distribution since the distribution of test statistic does not exactly follow Normal distribution. Therefore, it's only correct for a large sample size.

But David's answer is indicating a nonparametric test using Fisher's test.The information is here: https://en.wikipedia.org/wiki/Fisher%27s_exact_test And it can be applied to small sample sizes but hard to calculate for big sample sizes.

Which test to use and how much you trust your p-value is a mystery. But there are always biases in whichever test to choose.

Dr_Hope
  • 206
  • 2
  • 5
  • 2
    Are you trying to suggest that sample sizes in the thousands, with likely parameter values near $1/2$, are *not* large for this purpose? – whuber May 03 '16 at 01:03
  • 1
    For this case, I think you could use Dan's method but compute the p value in an exact way (binomial) and approxiamte way (normal Z>Φ−1(1−α/2)Z>Φ−1(1−α/2) and Z – Dr_Hope May 06 '16 at 04:39
  • +1 Not because sample sizes weren't large enough, but because the answer fits the title question and answers it for any sample size - therefore being useful for readers arriving here guided by title text (or Google) and having an smaller sample size in mind. – Pere Jul 10 '21 at 10:45
1

Your test statistic is $Z = \frac{\hat{p_1}-\hat{p_2}}{\sqrt{\hat{p}(1-\hat{p})(1/n_1+1/n_2)}}$, where $\hat{p}=\frac{n_1\hat{p_1}+n_2\hat{p_2}}{n_1+n_2}$.

The critical regions are $Z > \Phi^{-1}(1-\alpha/2)$ and $Z<\Phi^{-1}(\alpha/2)$ for the two-tailed test with the usual adjustments for a one-tailed test.

abaumann
  • 1,910
  • 14
  • 12
1

Original post: Dan's answer is actually incorrect, not to offend anyone. A z-test is used only if your data follows a standard normal distribution. In this case, your data follows a binomial distribution, therefore a use a chi-squared test if your sample is large or fisher's test if your sample is small.

Edit: My mistake, apologies to @Dan. A z-test is valid here if your variables are independent. If this assumption is not met or unknown, a z-test may be invalid.

Ryan
  • 21
  • 3
  • 3
    The "only if" part is an extreme position unlikely to be shared by many. *No* data actually follow a normal distribution. Few data actually behave as if drawn randomly and independently from a normal distribution. Nevertheless, z tests continue to be effective because the distributions of *statistics* (such as the difference of means) to which they apply can be *extremely well* approximated by normal distributions. In fact, the appeal to a $\chi^2$ test relies on the same asymptotic assumptions as a z test does! – whuber Mar 21 '16 at 01:44
  • If you believe in the CLT, then the normal distribution does commonly exist. – Ryan Mar 21 '16 at 02:49
  • 4
    @Ryan Well, I believe in the CLT but it doesn't say anything about n=30 or n=300 or n=5000. You don't actually get normality unless you somehow manage to have infinite sample sizes, or you somehow started with normality. Questions about how close we are to normality when taking averages are not addressed by the CLT.. (We can consider those questions but we don't use the CLT to find out if the approximation is any good.) – Glen_b Jul 26 '16 at 05:12
0

As suggested in other answers and comments, you can use an exact test that takes into account the origin of the data. Under the null hypothesis that the probability of success $\theta$ is the same in both experiments,

$P \bigl(\begin{smallmatrix}k_1 & k_2 \\ n_1-k_1 & n_2-k_2\end{smallmatrix}\bigr) = \binom{n_1}{k_1}\binom{n_2}{k_2}\theta^{{k_1 + k_2}}\left({1-\theta}\right)^{{\left(n_1-k_1\right)+\left(n_2-k_2\right)}}$

Notice that $P$ is not the p value, but the probability of this result under the null hypothesis. To calculate the p value, we need to consider all the cases whose $P$ is not higher than for our result. As noted in the question, the main problem is that we do not know the value of $\theta$. This is why it is called a nuisance parameter.

Fisher's test solves this problem by making the experimental design conditional, meaning that the only contingency tables that are considered for the calculation are those where the sum of the number of successes is the same as in the example ($1556 + 1671 = 3227$). This condition may not be in accordance with the experimental design, but it also means that we do not need to deal with the nuisance parameter.

There are also unconditional exact tests. For instance, Barnard's test estimates the most likely value of the nuisance parameter and directly uses the binomial distribution with that parameter. Obviously, the problem here is how to calculate $\theta$, and there may be more than one answer for that. The original approach is to find the value of $\theta$ that maximizes $P$. Here you can find an explanation of both tests.

I have recently uploaded a preprint that employs a similar strategy to that of Barnard's test. However, instead of estimating $\theta$, this method (tentatively called m-test) considers every possible value of this parameter and integrates all the results. Using the same notation as in the question,

$P \bigl(\begin{smallmatrix}k_1 & k_2 \\ n_1-k_1 & n_2-k_2\end{smallmatrix}\bigr) = \binom{n_1}{k_1}\binom{n_2}{k_2}\int_{0}^{1}\theta^{{k_1 + k_2}}\left({1-\theta}\right)^{{\left(n_1-k_1\right)+\left(n_2-k_2\right)}}d\theta$

The calculation of the p value can be simplified using the properties of the integral, as shown in the article. Preliminary tests with Monte Carlo simulations suggest that the m-test is more powerful than the other extact tests at different significance levels. As a bonus, this test can be easily extended to more than two experiments, and also to more than two outcomes. The only limitation is in the speed, as many cases need to be considered. I have also prepared an R package to use the test (https://github.com/vqf/mtest). In this example,

>library(mtest)
>m <- matrix(c(1556,2455-1556, 1671,2730-1671), nrow = 2, byrow = F)
>m.test(m)
[1] 0.0837938

In my computer, this takes about 20 seconds, whereas Barnard's test takes much longer.

vqf
  • 111
  • 3