1

How would one infer the number of people that took a test based on the percentages of people that got particular questions correctly.

For example 1. 85% 2. 25% 3. 95% 4. 15% 5. 35% $ n = 20 $

A caveat is that these percentages actually come with some noise, therefore you cannot be sure that something like 35.6% is 89/250 is a better answer than 7/20 (In fact the larger the estimate of N the less likely the estimate is true).

I hope this question is clear and I assume this will require Bayesian methods.

(Using Python if that matters)

SARose
  • 255
  • 3
  • 8
  • I guess you're aware of this but it's worth making explicit -- even if you have precise percentages (infinite d.p. with no noise or rounding error) one cannot rule out integer multiples of the lowest common denominator (if all the results are multiples of 1/20 you can't exclude the possibility that 40 or 60 people did the test, though it becomes increasingly unlikely that multiples of 2 or 3 people always get each question correct. – Glen_b Jan 25 '17 at 02:05
  • Indeed I know this. The lowest "best" common denominator would be the best and all larger ones would be ignored. – SARose Jan 25 '17 at 02:23
  • 1
    There's a somewhat related question (but not a duplicate) [here](http://stats.stackexchange.com/questions/51103/simple-multiple-choice-for-test-statistic-and-significance-level/). Silverfish gives some related discussion of a similar situation [here](http://stats.stackexchange.com/a/153630/805). There's also another somewhat similar question to that on site (again, not a duplicate) with more extensive analysis of a two-sample case but I wasn't able to turn it up via search. (These all relate to testing rather than estimation but the unknown sample size (unknown denominator) issue is the same – Glen_b Jan 25 '17 at 02:25

0 Answers0