sample size calculation for obtaining coin fairness

Question

Suppose that someone gives you a coin with some unknown weighting (maybe it's a fair coin, or maybe it's just 25% likely to get heads, or etc). How many times would you have to flip the coin to determine, within some confidence, the weighting of the coin (the probability the coin will get heads in general)?

Edit: To clarify, I'm looking to see how the coin is loaded/weighted: for example, an "unloaded" coin would be weighted such that the probability of heads is 50%, whereas a "loaded" coin may have the heads probability at 70%, 90%, or etc. I want to know when I can stop flipping the coin with some level of confidence that the probability of heads is x% (where x% is calculated using the data, not assumed prior to having data).

Edit: To clarify further, I'll give an example: suppose I have some system that outputs a 0 or a 1 after each trial. After 5 trials, I end up with {01111}. So the two questions it raises are a) how do I find the probability that the next result is a 1, given only my 5 previous trials and b) how do I find the confidence of the calculation performed in a (based on the answers given so far, I'm guessing I can use a confidence as a stopping point, ie once a confidence level of 80% is reached I can stop doing more trials)?

Thank you in advance for your help.

My question before was meant to give me some time to find [this video](https://d18ky98rnyall9.cloudfront.net/s8PvmhCWEea9QxKG-RHQIQ.processed/full/540p/index.webm?Expires=1471392000&Signature=ixdU5w5LDLHCl72Mjgh~tQTyccHID0maOVEe4m5CyVfMAkG~sYKlp8K4TAGJ~40vYxuRQ9wg~UfhhESlgeQbxUy0aWMoVL9kWcPwigLIugc7sWyYbyprn1vE-O-LnHmw5E1LlJUQkRtkmul2vOf9ML1Zy35l9Q9v1UQjmwaiB1I_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A), which explains the Bayesian approach to the problem you are bringing up. — Antoni Parellada, Aug 15 '16 at 18:56
As opposed to the frequentist approach also explained [here](https://d18ky98rnyall9.cloudfront.net/s80-axCWEea9QxKG-RHQIQ.processed/full/540p/index.webm?Expires=1471392000&Signature=JuAQZHYHDqbbH0udYhVNVYk2r0gyxH~wfNQIM8GfjootDc8pufGZGIwjiYAIfkOJzNzX4tBxO8B4At1oTN~9DWbl8LI8DA1iX5HGCNgV3PUd7S91GKJAkdyO7AMMKaW4Nk61POghshbTkMFF-vd~K8imCo9rGLJevgnQjTIiew4_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A). — Antoni Parellada, Aug 15 '16 at 18:56
Is it a theoretical or practical problem? If practical, then the answer is: you do not need to throw it at all since biased coin is impossibility (cf http://stats.stackexchange.com/questions/153076/is-tossing-a-coin-a-fair-way-of-randomising-a-group-into-two-groups/153080#153080 ) — Tim, Aug 15 '16 at 19:28
@Tim, it's a theoretical problem: in reality, I'm working with something different from coins, but this was the simplest / most direct means by which I could phrase my problem. — mwarrior, Aug 16 '16 at 19:19
@AntoniParellada, I do suspect that the coin is biased (in reality, the thing I'm dealing with isn't actually a coin, it just happens to be something that can only give me possible two outcomes whenever tested, so a coin was a straightforward means of me framing the problem.) — mwarrior, Aug 16 '16 at 19:20
Great! In this case I suspect that the videos I linked could really come handy. — Antoni Parellada, Aug 16 '16 at 19:23
@AntoniParellada, thanks for the video, it is helpful. However, it has lead me to believe that maybe I didn't communicate my question well. I'm not looking to determine whether the coin is or isn't loaded; rather, I'm looking to see _how_ the coin is loaded/weighted: for example, an "unloaded" coin would be weighted such that the probability of heads is 50%, whereas a "loaded" coin may have the heads probability at 70%, 90%, or etc). I want to know when I can stop flipping the coin with some level of confidence that the probability of heads is x%. — mwarrior, Aug 16 '16 at 19:36
Thanks so much for your help so far and bearing with me. I think I didn't do well articulating the question before. I've edited it to better reflect what I'm looking for. — mwarrior, Aug 16 '16 at 20:58

J. R. C. · Answer 1 · 2016-08-16T13:50:56.737

2

It is fairly well explained here:

https://en.wikipedia.org/wiki/Checking_whether_a_coin_is_fair

Basically, using the Bayesian inference method and assuming an uniform prior distribution, which is reasonable (it represents maximum initial uncertainty about the fairness of the coin), the posterior probability for the actual probability $r$ of obtaining heads in a single toss after having observed $h$ number of heads and $t$ number of tails (therefore $n=h+t$ is the total number of tosses) is a Beta distribution with parameters $\alpha=h+1$ and $\beta=t+1$

$$f(r|H=h,T=t)=\frac{(h+t+1)!}{h!t!}r^h(1-r)^t$$

This will give you an idea of how r is distributed. The maximum-a-posteriori estimate (mode) is

$$r^*(h,t)=\frac{h}{h+t}$$

And the expected value is

$$E[r](h,t)=\frac{h+1}{h+t+2}$$

One can use the standard deviation as estimation of the uncertainty

$$\sigma(h,t)=\sqrt{\frac{(h+1)(t+1)}{(h+t+2)^2(h+t+3)}}$$

As you can see it doesn't depend just on the total number of tosses $h+t$ but also on $h$, so the criterion for a given confidence interval will be different depending on the sequence of results of the tosses.

edited Aug 16 '16 at 13:50

answered Aug 15 '16 at 19:19

J. R. C.

31
5

I guess that you meant that h/(h+t) is the maximum *likelihood* estimator? Expected value of beta-binomial model with "uniform" Beta(1,1) prior is given by you in next line: (h+1)/(h+t+2). – Tim Aug 15 '16 at 19:26
What I meant is that h/(h+t) is the value of r that maximizes the posterior distribution. Since it is a skewed distribution the maximum and the mean take different values. It is usual to use the term "maximum a posteriori" in bayesian models, which literally means the value that produces the maximum of the posterior. – J. R. C. Aug 15 '16 at 23:00
1

You are updating the Beta$(1,1)$ distribution. The posterior is a Beta$(h+1,t+1)$ distribution. It is therefore immediate that $\sigma(h, t)$ (which I write instead of "$\sigma(r)$" because it depends on $h$ and $t$) is the standard deviation of the Beta$(h+1, t+1)$ distribution, which has a [simple closed formula](https://en.wikipedia.org/wiki/Beta_distribution). – whuber Aug 15 '16 at 23:33
So you meant mode of beta distribution - I think that this needs clarifying the same as where the numbers come from (beta-binomial model not the binomial itself). Also: uniform prior is not "lack of prior"! – Tim Aug 16 '16 at 04:28
Thanks for the comments, I improved the answer accordingly. – J. R. C. Aug 16 '16 at 13:52
@J.R.C.: So based on this, is there a way of dynamically determining when I've reached a sufficient number of tosses, given the outcomes of tosses done so far? For example, if I flip a coin and keep getting nothing but heads, at what point can I be content for a given confidence interval? – mwarrior Aug 16 '16 at 19:23
It's getting better, but your final remark is puzzling: $h$ does not represent the sequence of tosses; it's merely the total number of heads. *Of course* the estimate depends on how many heads show up! Regardless, although you're on track to answer the question, you haven't yet addressed its main concern: what *sequential decision procedure* should be used? – whuber Aug 16 '16 at 20:25
@J.R.C. thanks so much for your help so far and bearing with me. I think I didn't do well articulating the question before. I've edited it to better reflect what I'm looking for. – mwarrior Aug 16 '16 at 20:57

score 0 · Answer 2 · edited Apr 13 '17 at 12:44

I've been thinking about the Bayesian approach, but I believe now that it may end up being more theoretical or academic than intended in this very particular instance.

I wonder if what you are after (and I realize that I am working around some of the more specific questions at the end of your post) is to set up some statistical power calculations. You can do this with R, or online stats calculators, but the important concept is the flexibility in generating results that tell you in what percentage of cases you can expect to correctly call the coin biased depending on: 1. The number of tosses; 2. The degree of bias in the coin; and 3. The percentage of cases we are ready to accept as the risk of actually calling the coin biased when it is not (risk alpha).

Knowing that even with a fair coin you can get extremely surprising results, you can settle for a compromise that you will only call the coin "biased" if it goes beyond a threshold that guarantees you are going to make a mistake in only $\small 5\%$ of the times when the coin is in fact unbiased (risk alpha).

But you want to know what is in some respects a complementary idea: If the coin is biased, in what percentage of cases will you be able to make the call under the self-imposed constraints in the prior paragraph? In other words, the power. I have some notes in here if you are interested.

Evidently, your power to confidently reject the coin fairness will increase (will make itself more easily manifest) as you increase the number of tosses, or if you happen to be dealing with more extremely biased coins.

I am not going to re-invent the wheel by reformulating what is a very straightforward chunk of code that you can find in many guises, for example in this Berkeley University post, and that I'm pasting for ease of access:

set.seed(17) # Today's date.
coin.power = function(ntoss=100,nsim=1000,prob=.5){
     lower = qbinom(.025,ntoss,.5)
     upper = qbinom(.975,ntoss,.5)
     rr = rbinom(nsim,ntoss,prob)
     sum(rr < lower | rr > upper) / nsim
 }
ntosses = c(10,100,200,500,600,800,1000,1500,2000,2500)
res = sapply(ntosses, coin.power, prob=.55) 
names(res) = ntosses
res
10     100    200    500   600   800   1000  1500  2000  2500
0.032  0.133  0.259  0.634 0.653 0.799 0.867 0.969 0.994 0.999

In the case of a coin with a true biased probability of heads of $\small P(H)=55\%$, then, you can expect to reject correctly the hypothesis that the coin is fair $\small 99.4\%$ of the times if you toss the coin $\small 2,000$ times, and $\small 96.9\%$ of the times with $\small 1,500$ tosses.

If the coin is more biased, for example, $\small P(H)=70\%$, you can expect virtual certainty with $\small 200$ tosses:

res = sapply(ntosses,coin.power,prob=.70)
names(res) = ntosses
res
10    100   200   500   600   800   1000  1500  2000  2500
0.163 0.976 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

sample size calculation for obtaining coin fairness

2 Answers2