If I have some N human classifiers that can only predict in terms of 0 or 1 (not probabilistic, and also disregarding their own uncertainty. They either know or they don't), and each yield different precision/recall metrics for the same dataset, is there anything wrong with saying the average precision is then simply the arithmetic mean?
$$\frac{0.95 + 0.82 + 0.92}{3} = 0.90$$
Or is it "better" to calculate a final score from the majority vote (and why? Doesn't this add extra steps to calculate model uncertainty)?