0

Suppose there is a nucleotide position in a DNA sequence capable of taking 2 values, either A (adenine) or T (thymine).

Suppose that in a sample of 2,000 people (4,000 total DNA strands, because each person has 2 copies of DNA) 56% of the chromosomes have an "A" at the position and 44% have a "T".

In terminology that geneticists use, A is the "major allele" and T is the "minor allele."

However, now suppose a geneticist wants to determine the probability that, based on his sample of 2,000 people, "A" has been misdesignated the "major" allele. In other words, the sample mean for "A" was 56%, but the geneticist wishes to know the probability that the population mean for "A" is actually 0.499 or less.

Edit: First, my original question appears to have been unclear, so I have edited the entirety of the body. My hope is that this will make the question intelligible. Next, my apologies if this question is redundant. Before posting this question, I scrolled through all of the "similar questions" but I did not find an exact corollary. However, since this question cannot be deleted, I resolved to re-post it in a manner that is clearer, even if it is redundant.

Vincent Laufer
  • 252
  • 1
  • 12
  • Instead of "probability that the coin is unfair" perhaps consider the probability that a fair coin produced the observed result. (What exactly would the former mean anyway?) – P.Windridge Jul 24 '17 at 14:10
  • A similar question has been [asked before](https://stats.stackexchange.com/questions/291017/fair-coin-testing-combine-the-results-or-not#comment555146_291017). – Digio Jul 24 '17 at 14:24
  • 2
    Your problem statement is ungrammatical and therefore too vague to be answerable. Do you wish to "calculate the probability of success" from the data; do you wish to test whether that chance is "something other than 0.5;" or do you wish to determine whether "p(success) > P"? All three of those cases are amply addressed in other threads here about coin-flipping, Binomial probability distributions, estimation and hypothesis testing, so I encourage you to search for the answers. – whuber Jul 24 '17 at 15:01
  • @Vincent Laufer: Deriving the exact value of the probability of a Bernoulli experiment empirically based on frequency alone is an impossible feat unless you can gather up a "sample" of infinite individuals and repeat your experiment an infinite number of times; so when you say "determine the probability" I assume that you don't mean that literally and, if that is the case, your problem is a simple scenario of statistical hypothesis testing with more than one ways to set up your experiment. – Digio Jul 25 '17 at 12:56
  • the question has now been edited and rephrased. thank you. – Vincent Laufer Jul 25 '17 at 15:39
  • @Vincent Laufer My comment was on the rephrased question and I've given you the answer. What you want is a [one-sample test of proportions](https://onlinecourses.science.psu.edu/stat200/node/53). – Digio Jul 27 '17 at 09:01

2 Answers2

1

To test, whether the coin is fair, you could go with either a $\chi^2$-test, or a binomial test.
If you use R, both tests are included in the stats package:

binom.test(2700,5000)
# Exact binomial test
# data:  2700 and 5000
# number of successes = 2700, number of trials = 5000, p-value = 0.00000001646
# alternative hypothesis: true probability of success is not equal to 0.5
# 95 percent confidence interval:
#   0.5260654 0.5538879
# sample estimates:
#   probability of success 
# 0.54 

chisq.test(c(2300,2700))
# Chi-squared test for given probabilities
# 
# data:  c(2300, 2700)
# X-squared = 32, df = 1, p-value = 0.00000001542

As you can see, both tests suggest rejecting the null hypothesis of a "fair" coin at any sensible significance level.

Eldioo
  • 503
  • 4
  • 11
  • Hello Eldioo, thank you very much for your answer. However, the extra step that I am having trouble with is the need to sum over all possible values of P [whether p(success) > P implies a continuous range of values]. – Vincent Laufer Jul 24 '17 at 14:58
  • @VincentLaufer Do you need to solve the problem by hand? – Pere Jul 24 '17 at 15:36
0

I wanted to address an question that remains to me after reading @Digio 's comments and @Eldioo 's answer.

Two different answers were proposed.

First, Eldioo answered the question before it was rephrased. But, based on the rephrased question, it seems that based on Eldioo's response, would be:

binom.test(2240, 4000, alternative="greater")

This returns

    Exact binomial test

data:  2240 and 4000
number of successes = 2240, number of trials = 4000, p-value = 1.694e-14
alternative hypothesis: true probability of success is greater than 0.5
95 percent confidence interval:
 0.5469358 1.0000000
sample estimates:
probability of success 
                  0.56 

However another answer was proposed. Based on Digio's comments, we can use a test for a proportion as well. We can enter:

phat<-0.56
pnot<-0.50
n<-4000
numerator<-phat-pnot
denominator1<-(pnot * (1 - pnot))
denominator2<-sqrt((denominator1 / n))
Z<-numerator/denominator

and the Z-value is 7.589466.

We can then write:

pnorm(-abs(Z))

and we obtain 1.606133e-14. This is very similar to the value of 1.694e-14 given to us by the exact binomial test.

Vincent Laufer
  • 252
  • 1
  • 12