4

I hope that this question has not been asked before, but i couldn't find an answer to it online or by myself. I came by that question during a project of mine:

If i have an election with n voters and m possible choices, and the choices are equally likely selected by each voter, what would be the probability of at least 2 options receiving the same amount of votes (which is higher than the amount of votes for the other options, hence a shared majority between two or more choices)?

I assumed this would be just combinatorics, but couldn't solve it by myself.

Any insights are greatly appreciated. Thanks in advance, and again sorry if this has been asked before.

Shayan Shafiq
  • 633
  • 6
  • 17

1 Answers1

2

Each candidate has probability $1/m$ of being voted for by any voter, a constant. However, each candidate's total vote count is not independent of the other candidates' vote counts, because each voter can only vote once. If a voter votes for candidate $A$, the vote cannot also be cast for candidate $B.$ This is a multinomial distribution, and the joint probability that each candidate $i$ will have $y_i$ votes is given by \begin{align*} p(y_1,y_2,\dots,y_m) &=\frac{n!}{y_1!\,y_2!\cdots y_m!}\,(1/m)^{y_1}(1/m)^{y_2}\cdots(1/m)^{y_m}\\ &=\frac{n!}{y_1!\,y_2!\cdots y_m!}\,\frac{1}{m^n}, \end{align*} since $\sum y_i=n.$ But if we assume that $n\gg m,$ we can examine each candidate's distribution as approximately binomial (and independent of the other candidates' distributions). This would enable us to use order statistics to answer the question. Under this approximation, a single candidate's distribution would be $$f(y_i)=\binom{n}{y_i}(1/m)^{y_i}(1-1/m)^{n-y_i}.$$ Order statistics with discrete distributions are somewhat annoying - the analysis is quite different from continuous distributions. The wiki page has a good section on how to discover the distribution of the maximum order statistic. Following that procedure, we define: \begin{align*} p_1&=P(Y<y)\\ &=F(y)-f(y)\\ p_2&=P(Y=y)\\ &=f(y)\\ p_3&=P(Y>y)\\ &=1-F(y). \end{align*} Note that $$F(y):=\sum_{x=0}^{y}\binom{n}{x}(1/m)^x(1-1/m)^{n-x}.$$ This has rather messy closed-form expression involving the regularized hypergeometric function: $$F(y)=\left(\frac{m-1}{m}\right)^{\!\!n}\left(\!\left(\frac{m}{m-1}\right)^{\!\!n}-(m-1)^{-y-1}(y+1)!\binom{n}{y+1}\,_2\tilde{F}_1\left(1,-n+y+1;y+2;\frac{1}{1-m}\right)\!\right).$$ In any case, we compute the next three quantities regarding $Y_{(m)}:=\max(Y_1,Y_2,\dots,Y_m)$ as \begin{align*} P(Y_{(m)}\le y)&=\sum_{j=0}^{n-m}\binom{n}{j}p_3^j(p_1+p_2)^{n-j}\\ P(Y_{(m)}<y)&=\sum_{j=0}^{n-m}\binom{n}{j}(p_2+p_3)^jp_1^{n-j}\\ P(Y_{(m)}=y)&=P(Y_{(n)}\le y)-P(Y_{(n)}<y)\\ &=\sum_{j=0}^{n-m}\binom{n}{j}p_3^j(p_1+p_2)^{n-j}-\sum_{j=0}^{n-m}\binom{n}{j}(p_2+p_3)^jp_1^{n-j}. \end{align*} To finish up, we note that the $(m-1)$th order statistic is going to be very similar - we can approximate it as the same probability as what we just calculated. We wind up with: $$P=\left[\sum_{j=0}^{n-m}\left(\binom{n}{j}\left(p_3^j(p_1+p_2)^{n-j}-(p_2+p_3)^jp_1^{n-j}\right)\right)\right]^2.$$

Adrian Keister
  • 3,664
  • 5
  • 18
  • 35
  • 1
    The reasons I don't believe your answer are (1) it shouldn't be a perfect square and (2) in any case $m\ge n \ge 2$ it yields the wrong value. For instance, with $m=n=2$ there is only one term in the sum where $j=n-j=0,$ whence the sum equals $0,$ so you are trying to tell us $P=0.$ But enumeration of the four possible voting patterns shows $P$ must be $1/2.$ – whuber Aug 18 '21 at 12:15
  • @whuber No doubt you are right, but the special cases you mention aren't anywhere near my approximation assumptions. I have assumed that $n\gg m,$ and that the second-largest order statistic is distributed approximately the same as the maximum order statistic. – Adrian Keister Aug 18 '21 at 13:54
  • 2
    Fair enough. But when you make an approximation, it's a good idea to be explicit about the conditions under which it might apply and, if possible, to provide error estimates or bounds. – whuber Aug 18 '21 at 13:58
  • My stats chops aren't up to providing error estimates, I don't think. Do you have any ideas for bounding the error on my estimate? – Adrian Keister Aug 18 '21 at 14:08
  • 1
    I would start by recognizing the hypergeometric function as being both (a) an incomplete Beta integral and (b) a Binomial CDF. The latter can be approximated with the Normal CDF. – whuber Aug 18 '21 at 14:54
  • @whuber: I have a couple newer questions: https://stats.stackexchange.com/questions/540162/uniformly-most-powerful-test-for-weibull-distribution and https://stats.stackexchange.com/questions/541296/binomial-distribution-likelihood-ratio-test-for-equality-of-several-proportions. I was dearly hoping you'd take a look at them, please? – Adrian Keister Aug 25 '21 at 15:54