Questions tagged [hypergeometric-distribution]

A discrete distribution used to model sampling without replacement.

The hypergeometric distribution is a discrete distribution. It is used to model sampling without replacement from a collection of objects regarded as being of two types - for example, drawing otherwise identical colored balls from an urn.

Specifically, in that situation, it is the probability of drawing $k$ red balls ("successes") in a sample of $n$ balls drawn without replacement from an urn containing $K$ red balls out of $N$ balls in total.

The probability mass function of the distribution is:

$$P(X=k) = \frac{ {K \choose k} {N-k \choose n-k} }{N \choose n}$$

It arises in a number of contexts in probability and statistics including the analysis of 2x2 contingency tables when the margins are conditioned on, as is the case with Fisher's exact test.

Reference: Wikipedia - Hypergeometric distribution

187 questions
15
votes
1 answer

Fisher's Exact Test and Hypergeometric Distribution

I wanted to understand fisher exact test better, so I devised up the following toy example, where f and m corresponds to male and female, and n and y corresponds to "soda consumption" like this: > soda_gender f m n 0 5 y 5 0 Obviously,…
Alby
  • 2,103
  • 3
  • 19
  • 22
11
votes
3 answers

Probability of intersection from multiple sampling of the same population

Here is an example case: I have a population of 10,000 items. Each item has an unique id. I randomly pick 100 items and record down the ids I put the 100 items back into the population I randomly pick 100 items again, record down the ids and…
daemonk
  • 211
  • 1
  • 4
11
votes
3 answers

What is the probability of n people from a list of m people being in a random selection of x people from a list of y people?

If I am selecting 232 people from a pool of 363 people without replacement what is the probability of 2 of a list of 12 specific people being in that selection? This is a random draw for an ultra race where there were 363 entrants for 232 spots.…
Sarge
  • 365
  • 2
  • 11
9
votes
2 answers

How to apply multiple testing correction for gene list overlap using R

I have 2 studies looking at the patient response to the same drug. Study 1 found 10,000 genes expressed above the background and 500 of them are differentially expressed and referred to as the drug response signature. Study 2 found 1,000 genes…
9
votes
3 answers

What is the test statistic in Fisher's exact test?

For a 2 by 2 contingency table, some said Fisher's exact test uses the count $X_{1,1}$ in the (1,1) cell in the table as the test statistic, and under null hypothesis, $X_{1,1}$ will have a hypergeometric distribution. Some said its test statistic…
9
votes
1 answer

Hypergeometric: how do I construct a credibility interval around K (population successes) in R?

I have a problem for which I believe I should use the hypergeometric distribution, but I can't figure out how to do it in R. Say I have a bag of marbles with known number ($N$) of marbles, but the number of successes (white marbles) in the bag ($K$)…
8
votes
2 answers

Computation of hypergeometric function in R

I'm having tremendous difficulty evaluating $_2F_1(a,b;c;z)$ with the hypergeo package in R. In my case, values of $a$, $b$, $c$ are always positive real numbers. Even so, the hypergeometric function is incredibly sensitive to their values. I am not…
benrolls
  • 113
  • 1
  • 5
8
votes
0 answers

How to construct confidence limits based on small stratified samples of finite populations?

Imagine a business wishes to audit its transactions. It has a database summarizing the transactions, which constitute a sampling frame for the population. It would be time-consuming and expensive to examine each transaction in detail, so the…
8
votes
1 answer

How to calculate a sample size for validating correct/incorrectness of records in a data table?

I have read through existing answers on CrossValidated (plus elsewhere online) and can't find what I'm looking for, but do please point me to existing sources if I've missed them. Let's say I have a data set of N=1000 records, each of which can be…
7
votes
1 answer

Which are differences between the hypergeometric distribution and chi-square distribution

As the title suggest...I have a very basic question. I have a case with the following data: Universe: 18840 balls total red balls in the universe: 6680 Sample: 382 balls total red balls in the sample: 160 I would like to estimate if the percentage…
7
votes
2 answers

Use Fisher's Exact Test or a Hypergeometric Test?

I'm looking to test for set enrichment and I'm wondering whether a fisher's exact test or hypergeometric test is more appropriate (and, if there isn't a straightforward answer, what the relative merits are). To lay out an example problem, I have a…
Jautis
  • 588
  • 1
  • 4
  • 13
7
votes
1 answer

Probability of failure in a finite population

I regularly inspect finite populations for failures (we make custom products in batches of ~500-800). Currently, we inspect every product for failure, which is quite a bit of work. I want to reduce the number of samples we inspect by stating a…
7
votes
1 answer

Are the balls drawn randomly (independently of the number of balls existing in their colours)?

We have a big urn that contain $N_{Tot}$ balls. Balls are of $r$ different colours. The number of balls of the $i^{th}$ colour (before sampling) is $N_i$. John sampled $x$ balls in total (without replacement) from this urn. The number of balls of…
Sulawesi
  • 319
  • 2
  • 8
7
votes
1 answer

Estimating Size of a Set based on two Overlapping Subsets

I've searched everywhere for a similar question and many things come close but are not the same. I'm looking for a way to estimate the size of a set if two partially overlapping subsets are known (assuming both subsets were selected at…
6
votes
1 answer

Significance of overlap between multiple lists

I am trying to evaluate the significance of overlap between several gene lists. Here I have applied different methods to select genes relevant to a disease and I have several 4 way venn diagrams illustrating the results. My main goal is to…
gazwb
  • 159
  • 1
  • 3
1
2 3
12 13