I have a set of genes from particular species. Metagenome samples were mapped to this database of genes' sequences. As a result I had a presence/absence matrix for all the possible genes and species, which looked like this:
sample_1 | sample_2 | ... | sample_n
gene_1 1 0 1
gene_2 1 1 0
...
gene_m 0 0 0
0 stands for absence of gene, 1 for presence
Some of the samples were taken from patients with particular disease, some from healthy people. So they are divided into two groups: case and control For each gene I hypothesize that it has same proportion of presence/absence for both case and control groups. Alternative is that number of genes which are present and absent is unevenly distributed. I assume I can generate contingency tables for each gene:
Case | Control
-------+-------
Gene_i present a | b
Gene_i absent c | d
I thought I can use FET or Chi2 test depending on number of observations in cells. If minimum is less than 10, I use FET. Otherwise Chi2.
I have doubts whether assumption about marginal totals for FET is met, because there's no fixed number of times gene is present.
If minimum value in a table is more than 10, Chi2 test is performed, otherwise Fisher exact test is performed. Is it appropriate to use one procedure for p-values adjustment if they are obtained in different tests or p-values from these 2 tests should be treated separately?