I've got a set of differentially expressed biomarkers that I want to check for the significance of this observation.
For a similar problem, I've seen the hypergeometric test being used, where
- $k$ = number of detected differentially expressed biomarkers
- $K$ = total number of known differentially expressed biomarkers
- $n$ = size of sample
- $N$ = total population
to compute the p-value of seeing $\geq k$ biomarkers.
The tricky thing here is:
- the event is very rare. i.e., $N$ >> $K$ (i.e. $\frac{K}{N} < 10^{-6}$)
- the true value of $K$ is unknown; I've got an approximate number but the actual value of $K$ is likely to be larger. I've seen this post but not sure it's applicable to my dataset given the rarity of seeing a "Type I" object
- [EDIT] the typical size of $n$, my sample, is around $\sim 10^6$, and it's sampling without replacement. Side note: the true value of $N$ is not known either but typically approximated as $N \geq 10^9$
To compute the p-value of seeing $\geq k$ biomarkers for my dataset, does it still make sense to use a hypergeometric test?
I was wondering if a Poisson exact test makes more sense where the null hypothesis assumes that the rate is equal to $K/N$ against the alternative of $k/n$ in my sample?