for my master thesis I count and identify sediment grains. In total I have 82 samples from 3 different gravity cores. I divided the sediment components in 11 groups (Quarz, Mica, Opaque, Aggregate, Other terrigenous, etc.). In order to estimate how many grains I need to count I made a preliminar study with 5 samples of one core.
First, I randomly counted and identified 300 grains in every sample. After that, I did the same but with 100 grains. The null hypothesis is that there is no difference when counting 100 or 300 grains, whereas the alternative hypothesis implies that there is a statistically difference when counting 100 or 300 grains.
To compare those two methods (counting 100 or 300 grains), I converted the count data in percentages. Part of the data:
> "Component" "Method" "Percentage"
"Aggregate" "A" 4
"Aggregate" "A" 3
"Aggregate" "A" 2
"Aggregate" "A" 1
"Aggregate" "A" 1
"Aggregate" "B" 1.66666
"Aggregate" "B" 0.66666
"Aggregate" "B" 1.66666
"Aggregate" "B" 2
"Aggregate" "B" 1.33333
"BenthForam" "A" 19
"BenthForam" "A" 11
"BenthForam" "A" 9
"BenthForam" "A" 15
"BenthForam" "A" 13
"BenthForam" "B" 16
"BenthForam" "B" 11.33333
"BenthForam" "B" 11.66666
"BenthForam" "B" 17.66666
"BenthForam" "B" 15.33333
"Mica" "A" 3
"Mica" "A" 19
"Mica" "A" 13
"Mica" "A" 8
"Mica" "A" 14
"Mica" "B" 6.66666
"Mica" "B" 7.33333
"Mica" "B" 10
"Mica" "B" 8.66666
My first attempt was to use ANOVA with a nested linear model (R-code):
aov (Percentage ~ factor(Component) + factor(Component):factor(Method))
Component are the (11) different groups, Percentage is the count data and Method is counting 100 (Method A) or 300 (Method B) grains
But residuals of the ANOVA are neither normally distributed nor equal variances can be assumed. Also the data shows overdispersion and I was thinking about negative binomial regression. The Problem here is that I have an upper boundary and the only way to use this test would be to exclude one component like quarz since it is the most abundant component in each sample.
What test would you recommend me or can I change my approach? I use R and prefer to have references if possible.