1

2 people flip a coin 100 times. The first person get 42 heads, the second person gets 58 heads.

What is the probability of this outcome and all other possible outcomes that have an equal or lower probability of occurring?

The calculation of the specific outcome is straightforward enough to calculate as the product of two binomials (i get p=0.02229 * 0.02229 = 0.000497). There are 9800 possible outcomes that have probability of .000497 or lower, out a total of 101 x 101 = 10201 possible outcomes. When I sum all of the 9800 occurrences that have probability of .000497 or lower, I get a probability of .077687.

Now given the number of trials involved, I would think that a z-test for difference in proportions with continuity correction (or if you prefer, chi-square with 1 degree of freedom) would give nearly the same probability, but it does not - I get z = 2.1213 and p=.033895 (chi-square stat is 4.5).

I can only assume I am making some type of logical error - should not the product of 2 binomials converge to a normal distribution as the number of trials increases? Note I have rounded some of the probabilities above but they are not rounded in the spreadsheet I used to make the calculations.

user221943
  • 135
  • 1
  • 1
  • 7
  • When you say "What is the probability of this outcome and all other possible outcomes that have an equal or lower probability of occurring?" do you mean "What is the probability that the first and second person got this many heads *or fewer*"? Otherwise you are asking "What is the probability that both A and B got either < 43 or > 57 heads?". – Slow loris May 18 '16 at 17:06
  • The CLT applies to when you have an increasing number of r.v. , in that their mean will be more and more normally distributed. In your case, you only have 2 random variables. – FisherDisinformation May 18 '16 at 17:08
  • Also, either way, to compute those probabilities you can use the cumulative binomial distribution. In R it is given by the `pbinom` function. So, for example, to compute the probability that someone gets 42 or fewer heads from 100 fair coin tosses, you would do `pbinom(q=42, size=100, prob=0.5)`. To compute the probability of MORE than 42 heads, you just add the `lower.tail=FALSE` argument. – Slow loris May 18 '16 at 17:13
  • 1
    (1) How do you come up with $p=0.003895$ for $Z=2.1213$? There must be a typographical error in there somewhere: that $p$ is an order of magnitude too small. (2) I find $9797$ outcomes with equal or smaller probabilities. Their total probability is $0.076196\ldots$. How do you find only $401$? (3) This is not a suitable critical region for the difference in proportions. – whuber May 18 '16 at 17:41
  • To Slow Ioris: I am asking what is the probability of all possible outcomes that have an equal or lower probability of occurrence. So for example, person A gets 43 heads and person B get 59 heads would be included in the sum, because the probability of this outcome is .000477 which is lower than the probability of our outcome of 42 and 58 heads (.000497). – user221943 May 18 '16 at 18:29
  • To Whuber: My mistake, 401 is the number of outcomes with a higher probability of occurring, so the number <= should be 9599 by my count. My p-value of .003895 is the probability of Z<= -2.1213 or Z >= 2.1213 (two tailed). I calculate Z as p-diff/(standard error). p-diff with continuity correction is ((58-0.5) - (42+0.5))/100 = 0.575. Standard error is sqrt of [(0.5*(1-0.5))/100 + (0.5*(1-0.5)/100)] = 0.070711 – user221943 May 18 '16 at 18:53
  • Sorry, there is typo on my p-value, should be 0.03895! – user221943 May 18 '16 at 19:12
  • OK, those remarks clear up some of the issues. But have you noticed the problem with the critical region? For instance, you are including outcomes like 97 heads for the first person and 97 heads for the second person within your region, even though this does not provide any evidence at all that the coins differ. That's why it's not relevant to compare your calculations to what the test of proportions is saying. – whuber May 18 '16 at 19:22
  • Now I am embarrassed.....p-value I calculated is 0.033895. I have corrected this and my count of outcomes with probabilities <= the 4, 58 outcome in the original post – user221943 May 18 '16 at 19:23
  • whuber: yes I see your point, but it does show evidence that the proportion is not 0.5 for both. Since we specifically use the overall proportion of 0.5 in calculating the z-stat, isnt that whats really being tested? perhaps this is where my logic is wrong – user221943 May 18 '16 at 19:28
  • If i exclude all occurrences where the number of heads are the same, probability goes down to 0.075792....still way different than z-test p-value – user221943 May 18 '16 at 19:32
  • maybe the better question would be: What outcomes should be summed to do essentially what we are doing for a z-test of proportions? BTW my counts outcomes with equal or lower probability is actually10073...I have a 101 x 101 table of outcomes, of which only 128 have a higher probability of occurring. – user221943 May 18 '16 at 20:25
  • Interestingly, I get very close to the z-test p-value if I take half the value I calculate through summing the product of the binomials....perhaps I am double counting if I add, for example, the probability of: A getting 30 heads and B getting getting 40 heads; and A getting 40 heads and B getting 30 heads? – user221943 May 18 '16 at 20:35
  • No, you're not double-counting. Look at the thread on p-values at http://stats.stackexchange.com/questions/31. My answer there, although long, was posted expressly to explain what the logic is and how appropriate critical regions are (conceptually) found. – whuber May 18 '16 at 20:54

1 Answers1

1

There might be more than one way that "outcome" can be interpreted in this context. But here's one way:

> # Assuming fair coing
> # Probability of getting 48 heads from 100 tosses
> p48 <- dbinom(48, 100, .5)
> p48
[1] 0.07352701
> 
> # Probability of getting 58 heads from 100 tosses
> p58 <- dbinom(58, 100, .5)
> p58
[1] 0.02229227
> 
> # Combined probability of both outcomes
> # Assuming independence
> pobs <- p48 * p58
> pobs
[1] 0.001639084
> 
> 
> # Probability of getting x heads out of 100
> # where x is 0 to 100
> pall <- dbinom(0:100, 100, .5)
> 
> # probability of a pair of possible outcomes
> pmat <- outer(pall, pall)
> 
> # sum of probability of all possible outcomes
> # with probabilities equal or less than that obtained
> sum(pmat[pmat <= pobs])
[1] 0.2583799
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
  • Hi Jeromy, thank you for your input. I get the same results as you for 48 heads and 58 heads. My original post and numbers I calculated were for 42 heads and 58 heads. I concocted this specific example because I wanted to compare my calculation of the probability (which appears to be consistent with your calculations) with the p-value from a Z-test of the difference in proportions, thinking that given the fairly large sample sizes, the two probabilities should be very close - but they are not! Z-test of diff in proportions give p-value of 0.033895. Any thought on discrepancy? – user221943 May 19 '16 at 16:47
  • fyi my Z-test p-value of 0.033895 is with continuity correction....not very close to direct computation of p-value of 0.077687 using product of binomials approach.... – user221943 May 19 '16 at 17:29