2

I am trying to figure out a way to run a hypothesis test based on binomially distributed random variable, using R.

I have a control sample with size 1000. Random variable is binary. The probability if successes found in this sample is 1.0., i.e., all 1000 values equal one.

I have a test sample of size 100, where the probability estimate of successes is 0.98, i.e., there are 98 ones and 2 zero.

I wish to check whether 98 out of 100 is probable given a prior prob of 1.0.

What I want to do is: sum(dbinom(x = c(0:98), size = 100, prob = 1.0, log = FALSE)), i.e., I calculate a prob integral over a support in the left tail.

However I get zero response, since (https://en.wikipedia.org/wiki/Binomial_distribution) 1 - p = 0; and n - k = 0.

Question: I clearly see that sample estimate of probability equal one allows for 0.98 probability in another sample. H0 of no difference should hold here. However, formal test does not work for this limit case. Is there a workaround to run the hypothesis test?

Alexey Burnakov
  • 2,469
  • 11
  • 23
  • FYI: https://stats.stackexchange.com/questions/134380/how-to-tell-the-probability-of-failure-if-there-were-no-failures and https://stats.stackexchange.com/questions/82720/confidence-interval-around-binomial-estimate-of-0-or-1/82724#82724 – Tim Aug 17 '17 at 10:11

1 Answers1

2

The hypothesis you start with is a very bad hypothesis. If you assume that the probability of success is equal to $1$ (successes will always happen), this means that you assume a degenerate distribution for your data

$$ f(k) = \begin{cases} 1 & \text{if } & k = n \\ 0 & \text{if } & k \ne n \end{cases} $$

Actually, it follows from the binomial distribution since for $k=n$

$$ {n\choose k}p^k(1-p)^{n-k} = {n\choose n} 1^n(1-1)^{n-n} = 1 \times 1 \times 1 $$

assuming that if $x^0 = 1$, then $0^0 = 1$, while for $k < n$

$$ {n\choose k}p^k(1-p)^{n-k} = {n\choose k} 1^k(1-1)^{n-k} = {n\choose k} \times 1 \times 0 $$

since $0^k = 0$.

Since assuming degenerate distribution, there is no uncertainty, or "randomness", assumed for your data. You assume that the only thing that could happen is $k=n$. So your hypothesis test is very simple: if $k=n$ then your $p$-value is $1$, otherwise $0$.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thank you! It is clear. The same result is achieved by running dbinom, pvalue = 0 if I have success number > 0 in a test sample. However, isn't it strange to say that observed probability equal to zero on a limited sample size will not allow to happen at least 1 success in another random sample drawn from the same population? I want to compare significantly two shares in two different random samples, with one complication: share in control sample is zero. – Alexey Burnakov Aug 17 '17 at 10:44
  • @AlexBurn What is strange for you in here? You assume that successes will *always* happen, since you assume them to happen with probability 1. You give no space for any uncertainty. – Tim Aug 17 '17 at 10:50
  • I perfectly understand you. My deep question is: if I am given two coins of arbitrary quality, and one of them shows 10 heads in 10 tosses, while the other shows 8 heads in 10 tosses, are these coins identical? What is the probability of them being different (p-value)? Maybe I put my original question silly. – Alexey Burnakov Aug 17 '17 at 10:53
  • Thanks for this link: https://stats.stackexchange.com/questions/134380/how-to-tell-the-probability-of-failure-if-there-were-no-failures Looks what I need – Alexey Burnakov Aug 17 '17 at 11:06
  • 1
    @AlexBurn yeah, check the two links. Basically: in your scenario you *cannot* use the standard estimator for p, i.e. $k/n$ since it will be wrong. What follows, you cannot use the "standard" methods for such data. You could simply replace the estimates of $p$ in your test with the robust estimates. – Tim Aug 17 '17 at 11:11