2

If I have a population of exactly 100.000 voters, and I know they all voted in either candidate A, B, C, D or E. (e.g where the results of a random sample of 100 voters are A = 40%, B = 30%, C = 10%, D = 10%, E = 10%).

If I incrementally increase my sample size to 200 people, count the votes, then to 300, 400, etc, I get a more accurate prediction with increasing sample size. However, once larger samples are more costly; what test/metric can I use on random samples to determine what is the smallest sample size that allows me to say:

"with this population size (voters) and this number of options (candidates), this sample size of x has a 95% probability of detecting the winner (e.g A>B)"

Common sense tells me that if the final election result is A=50% B=49%, the sample needed to find the winner will be much bigger than if it is A=90%, B=9%. But I think there must be a test that looks into how incremental sample sizes impact the results, and might be used to inform me that after a certain point increasing the sample size is unlikely to change the reliability of my prediction.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 1
    Try doing a search (on Google or this site) for *power analysis*. – tddevlin Aug 16 '18 at 17:28
  • Are you sampling with or without replacement? – Sebastian Aug 16 '18 at 20:20
  • 1
    @tddevlin Although power analysis is relevant, it's not going to give a useful answer in this *sequential testing* situation. The key difference (IMHO) is this question is explicitly aware that the effect size and variances which must be input into any power calculation are themselves *uncertain estimates* based on the initial sample, and so that uncertainty needs to be accommodated. – whuber Aug 16 '18 at 21:22
  • Orangetree: The samples would be without replacement, because once someone voting choice is known, there is no point in making it unknown again. Information in this case would be cumulative (e.g. Get data on 100 people, count votes, get 100 more, count votes, get 100 more, etc.). – Sergio Henriques Aug 17 '18 at 17:09
  • Whuber: I didn't explicitly mention variance, but I believe that is a key factor in solving my problem. How can I accommodate for unknown effect size and for the inherent uncertainty of the variance in the initial sample? – Sergio Henriques Aug 17 '18 at 17:20
  • I have rephrased my question and placed it here: https://stats.stackexchange.com/questions/362726/how-can-i-do-a-power-like-analysis-when-the-effect-size-is-unknown – Sergio Henriques Aug 17 '18 at 19:30
  • You'd probably want to find the proper sample size given the desired error, instead of performing a series of tests – Roberto Aug 17 '18 at 21:34
  • @Roberto I have no desired error (no Ho) just want to know the winner (with 95% certainty). – Sergio Henriques Aug 18 '18 at 20:40

0 Answers0