How to use hyper-geometric test

Question

My professor wrote some things very quickly on the board and I had a very hard time interpreting what arguments are being made. I am trying to test the conclusion. I read this post but I'm still not quite grasping it. If I could be pointed in the right direction with some solid guidance, that'd be really helpful. Mainly, my questions are the following

Why/When exactly would one use the hyper-geometric test?
Is it obvious to identify the null hypothesis here?
What exactly is the argument that is being made here?

score 5 · Accepted Answer · answered Aug 29 '17 at 09:37

5

You can look at wikipedia.

The hypergeometric test uses the hypergeometric distribution to measure the statistical significance of having drawn a sample consisting of a specific number of k successes (out of n total draws) from a population of size N containing K successes. In a test for over-representation of successes in the sample, the hypergeometric p-value is calculated as the probability of randomly drawing k or more successes from the population in n total draws. In a test for under-representation, the p-value is the probability of randomly drawing k or fewer successes.

The null-hypothesis here is that u, that, is, the probability under radation of the gene, is equal to p, the probability without radiation of the gene. While I would think that p has to be considered uncertain here as well, here a shortcut is taken by estimating p directly from the sample, as $\frac{A}{A+B}$. It is possible that this is equivalent.

The formulas are the probability under the null hypothesis (where $p = \frac{A}{A + B}$) that you would see $C$ or more bacteria with the gene out of $C + D$ samples. So that is the $p$-value.

That is the answer. Since you are thinking about $p$-values, maybe you can read a couple of blogposts from Andrew Gelman, I think it is a good idea to be a bit sceptical about this hypothesis framework.

answered Aug 29 '17 at 09:37

Gijs

3,409
11
18

Just to clarify, I'll restate exactly what I think you are saying: Null hypothesis is "Probability of expression of gene without radiation = probability of expression of gene with radiation." Correct? And that probability formula tests whether I would see C or greater # of bacteria (out of C+D samples). As an aside: the way the null hypothesis works into the equation is by multiplying the quantity "(A/(A+B))^u" correct? – Christian Aug 29 '17 at 20:45
So if P(U >= C) is significantly greater than or less than (has a value greater than 50% essentially), then the null hypothesis is void. Otherwise, we can trust it. This is how I've understood your answer. Please correct me if I'm wrong. – Christian Aug 29 '17 at 20:47
Yes, the formulation of the null hypothesis is how I understood it as well. But the usual "level" (that's the term used) to test this $P(U >= C)$ against, is 0.05, not 50%. That means, if you are quite surprised to see the data (you expected something this extreme in 1 in 20 cases at most) based on the null hypothesis, you think the null hypothesis is suspect. Be careful though, if you cannot reject the null hypothesis, that doesn't mean it's correct. For example, if you only see one data point, you will never be able to reject it. In this situation, your study is said to have low "power". – Gijs Aug 30 '17 at 08:55
Actually, I just used realized that the formula written is actually like the binomial distribution. Is that correct? And why is the exponent on (1-P) the way it is? Is there an error? I believe it should be (C+D-U). But it just says (D-U). @Gijs – Christian Sep 02 '17 at 06:58
Yes, it is the binomial distribution. It is just the number of "successes" with probability p out of c + d trials. – Gijs Sep 02 '17 at 08:32
So it IS just (d-u)? and not (c+d-u)? @Gijs – Christian Sep 02 '17 at 08:34
Oh and I'm not referring to the "number of success" entity, but rather the "number of failures" entity. – Christian Sep 02 '17 at 08:49
Didn't catch that... But yes, I think you're right, it's an error. Otherwise, I cannot explain it. – Gijs Sep 02 '17 at 10:53
How (is) this different that running the Fisher's exact test? – Harvey Motulsky Jan 31 '19 at 15:41

How to use hyper-geometric test

1 Answers1

Linked