For an ecological research project, I am trying to model the effect of different factors on the prevalence of a specific pathogen in ticks.
Ticks were collected from around 80 different plots and screened for pathogens. Prevalence is then the proportion of ticks tested positive per site.
Ideally, I would like to use prevalence as the response variable in a GLM or GLMM. It follows a negative binomial distribution. I would then treat this proportion as a count variable (no. of positives out of 100 samples).
However, the number of ticks collected varies across sites. (From 1 to 110)
The prevalence calculated for plots with smaller sample sizes is obviously less reliable compared to plots with larger sample sizes. Is there a way to include a measure of this uncertainty in the analysis?
For now, I excluded all plots with sample sizes < 50, but this doesn't seem like an elegant solution.
Should I try a different modeling approach? I feel like there must be a better way to do this.
All help is greatly appreciated!