I have data on survival and fecundity (fruit production) for a bunch of plants grown in multiple experiments. There are multiple plants per genotype, so I take the trait mean per genotype.
I am interested in the (Spearman's rank) correlation between genotype means for these variables, and the uncertainty around that. To estimate confidence intervals I have used a non-parametric bootstrap to re-estimate genotype means and correlations. However, I find that the bootstrap distribution is substantially lower than the observed values. In most cases the observed statistic is beyond the upper CI of the bootstrap distribution.
My first thought is that I made a mistake in the code, but I have checked multiple times and cannot find anything.
My question is: can this be something real in that case? For example some kind of weird bias in the observed data that doesn't permeate to the resamples? If so, would it be valid to report the mean of the bootstrap distribution, or do something else?
If the answer is 'no', then I know I must have made a coding error and will keep looking.
Additional details:
- For I am using the
cor
function in R to calculate correlations, usingmethod='spearman'
anduse=complete.obs
. - Survival is a proportion. Fecundity is count data, and has a lot (~27%) of missing data (plants that did not survive to reproduce).
- I do not see any evidence for outliers in the plots.
- I don't think this is an artefact of small sample size (there are >400 genotypes, with 18 plants per genotype)