How can I run ANOVA or tests for statistical significance on a bi-modal sample that came from a normal population?
Context:
I was tasked with running an ANOVA to see if genotypes (treatments / factors) had an effect on phenotypes (responses) in a simple fixed-effects model. The standard is to apply a fixed-effects linear model on each genotype to phenotype, run ANOVA, then a Tukey's HSD. See here for more context.
Unfortunately, out of the 416 samples phenotyped, only 356 were genotyped (~85%). Furthermore, the samples genotyped were non-random; most of the excluded samples came from the peak, shifting the samples from a normal distribution to a bi-normal distribution.
I know ANOVA doesn't apply when we have a non-normal population.
To fix this, would I use bootstrapping or parametrization? How would I set that up?
Mock-data to show extent of sampling bias (in R):
data_pre_sample <- structure(list(phenotype = structure(1:9, .Label = c("1", "2", "3", "4", "5", "6", "7", "8", "9"), class = "factor"),
frequency = c(4, 16, 48, 108, 116, 88, 32, 4, 0)),
.Names = c("phenotype", "frequency"), row.names = c(NA, -9L), class = "data.frame")
data_post_sample <- structure(list(phenotype = structure(1:9, .Label = c("1", "2", "3", "4", "5", "6", "7", "8", "9"), class = "factor"),
frequency = c(4, 16, 48, 108, 64, 80, 32, 4, 0)),
.Names = c("phenotype", "frequency"), row.names = c(NA, -9L), class = "data.frame")
Phenotypes:
- 1, 2, 3, 4, 5, 6, 7, 8, 9
Pre-selection phenotype count:
- 4, 16, 48, 108, 116, 88, 32, 4, 0
Post-selection phenotype count:
- 4, 16, 48, 108, 64, 80, 32, 4, 0
Sources Consulted:
- Explaining to laypeople why bootstrapping works
- Assumptions regarding bootstrap estimates of uncertainty
- Difference between ANOVA and permutation test
- I read the first and third paper mentioned here as well
- Correcting biased survey results
- Fixing a biased (deliberately) sample
- http://www.stat.cmu.edu/~cshalizi/uADA/13/lectures/which-bootstrap-when.pdf
I apologize if my question isn't clear; I am relatively new to stats stackexchange and I am just refreshing and expanding my knowledge in statistics.
Updates June 19th
The underlying distribution of our effects should be Normal. Effects were on a 1 - 9 scale (1 being the lowest performing, 9 the best performing). Geneticists were interested in which treatments (genotypes) corresponded to the worst performing and high performing, so they excluded taking measurements in the middle-performing (5). It is also safe to assume many of the treatments will have no correlation to the phenotype (so far I put in 739 treatments).
How would I implement sample weights or corrections? Bootstrapping residuals, or am I going down the wrong path entirely?