76

Do not vote, one vote will not reverse the election result. What's more, the probability of injury in a traffic collision on the way to the ballot box is much higher than your vote reversing the election result. What is even more, the probability that you would win grand prize of lottery game is higher than that you would reverse election result.

What is wrong with this reasoning, if anything? Is it possible to statistically prove that one vote matters?

I know that there are some arguments like "if everybody thought like that, it would change the election result". But everybody will not think like that. Even if 20% of electorate copy you, always a great number of people will go, and the margin of victory of winning candidate will be counted in hundreds of thousands. Your vote would count only in case of a tie.

Judging it with game theory gains and costs, it seems that more optimal strategy for Sunday is horse race gambling than going to the ballot box.

Update, March 3. I am grateful for providing me with so much material and for keeping the answers related to statistical part of the question. Not attempting to solve the stated problem but rather to share and validate my thinking path I posted an answer. I have formulated there few assumptions.

  • two candidates
  • unknown number of voters
  • each voter can cast a random vote on either candidate

I have showed there a solution for 6 voters (could be a case in choosing a captain on a fishing boat). I would be interested in knowing what are the odds for each additional milion of voters.

Update, March 5. I would like to make it clear that I am interested in more or less realistic assumptions to calculating the probability of a decisive vote. More or less because I do not want to sacrifice simplicity for precision. I have just understood that my update of March 3 formulated unrealistic assumptions. These assumptions probably formulate the highest possible probability of a decisive vote but I would be grateful if you could confirm it.

Yet still unknown for me thing is what is meant by the number of voters in the provided formulas. Is it a maximum pool of voters or exact number of voters. Say we have 1 milion voters, so is the probability calculated for all the cases from 1 to milion voters taking part in election?

Adding more fuel to the discussion heat

In the USA, because president is elected indirectly, your vote would be decisive if only one vote, your vote, were to reverse the electors of your state, and then, owing to the votes of your electors, there was a tie at Electoral College. Of course, breaking this double tie condition hampers the chances that a single vote may reverse election result, even more than discussed here so far. I have opened a separate thread about that here.

Przemyslaw Remin
  • 1,128
  • 10
  • 16
  • 23
    This is called the [Paradox of voting](https://en.wikipedia.org/wiki/Paradox_of_voting). Since you mention optimal strategies, you may be interested to know that the so-called "pivot probability" (chance that all other votes are an exact tie so that your vote is the pivotal one) is central to the [Myerson-Weber optimal voting strategy](https://en.wikipedia.org/wiki/Tactical_voting#Myerson-Weber_strategy) . – olooney Feb 28 '20 at 15:07
  • 7
    The fallacy is assuming you know the probability of something when you don't, as well as the "return" of an outcome. – AdamO Feb 28 '20 at 17:08
  • 3
    Part of the 'truth' hidden in this reasoning may be found in its assumption that votes are *uncorrelated*. The most forceful attack on this anti-voting devil's advocacy is perhaps to be found in movement-building efforts that 'correlate' others' votes with your own. – David C. Norris Feb 28 '20 at 19:55
  • 1
    the probability of pivotal vote (where your vote decides the outcome) is very low indeed. For US presidential elections a reasonable estimates are $10^{-7}$ or lower, see my answer. is it lower than a lottery win? I don't think so, but a probability of getting into a car accident may very well be higher for US Presidential elections – Aksakal Feb 28 '20 at 20:22
  • 4
    Also, I don't think we should be jumping on "voter suppression" wagon here. It's unhealthy to drag the political sentiment into a statistical forum, especially, when the question of rationality of voting is far from settled in the domain of rational choice studies. It's a legitimate puzzle, and you're not helping resolve it by accusations of voter suppression intent. I suggest that we remove the politics from answers. Let's stick to what we know best - stats – Aksakal Feb 28 '20 at 20:25
  • 1
    Since it hasn't yet been referenced in any of the answers: Gelman, Silver and Edlin (2010) analyse this for the 2008 Presidential Election. They use state level polling and other data to estimate how close elections will be as well as turnout. They estimate the chance the vote was pivotal at between 1 in 10 million and 1 in a billion depending on the state. Extremely small, but then you need your state to be pivotal, and your vote to be pivotal in your state. As the answers make clear - the probability can vary a lot from election to election. The quote will hold for some but not others. – CloseToC Feb 28 '20 at 21:13
  • 1
    Aside from the stats answer, one could argue that even a vote for no candidate has value. By voting, you increase voter turnout metrics in your area and demographics, increasing the value to future politicians of endorsing policies that you specifically support. – Sam Feb 28 '20 at 22:37
  • 5
    @Sam, in USA, most of us know in advance which candidate will get ALL of our state’s electoral votes. To me, that is an incentive to send a miniscule subliminal message by adding a vote to a third-party candidate. Since almost half of us don’t bother to vote, not voting is probably perceived as just one more who doesn’t care. – WGroleau Feb 28 '20 at 22:46
  • 2
    There are two factors that, in my mind at least, increase the value of voting beyond what some of the answers say: first, in the case on a single vote making a difference, whose vote made that difference? Since you can't narrow it down to a single person, it a sense it must be EVERY person that voted for the winning candidate, multiplying the value of each vote. Second, the margin of error in any election greater than a few people is always greater than one vote. – Michael Feb 29 '20 at 01:34
  • Speaking as a person involved into building of a political party: there is always more then simply "pivot situation" in politics. Almost always political situations in any country are stale: political forces are trying to fight each other instead of _increasing influence_. This is well described in game theory and "theory of group games". When parties fight for influence, they are really appreciate every single vote. – sanaris Feb 29 '20 at 20:08
  • PS. When election is an issue of "life or death" situation for party, that situation is far more common then "pivot voting" between two major parties. In fact, not pivot votings shape the future, but the presentation of small players who build alliances (one of results from game theory already well known in politics). – sanaris Feb 29 '20 at 20:11
  • 1
    @DavidC.Norris is correct to suggest that correlation may play an important role in voting. However, some research suggest that correlation might be the reason why your vote actually doesn't matter, sadly. See the discussion of correlation in section 5.1 in http://www.stat.columbia.edu/~gelman/research/published/banzhaf_bjps_final.pdf – Aksakal Feb 29 '20 at 22:33
  • Related: ["List of close election results"](https://en.wikipedia.org/wiki/List_of_close_election_results), Wikipedia. – Nat Mar 01 '20 at 14:45
  • 3
    "But everybody will not think like that." Yes, they do. Hence rates are much lower than they should otherwise have been. Just over half of the eligible US voted in each of the last four presidential elections. – Luke Sawczak Mar 01 '20 at 18:31
  • *" the probability of injury in a car accident on the way to the ballot box is much higher than your vote reversing the election result. What is even more, the probability that you would win grand prize of lottery game is higher than that you would reverse election result."* There is nothing wrong with this part, they are correct premises. What is wrong is the reasoning£/conclusion to not vote and the reason you already mention in your question *"I know that there are some arguments like.."*. Staying home or not, when a large mass decides this then it willl swing the vote. The weather decides. – Sextus Empiricus Mar 01 '20 at 19:48
  • To me, this sounds like some game theory and psychology is needed. Whether or not to vote is not a decision made solely on weighing *personal* cost and benefit. This has little to do directly with statistics (it is indeed very unlikely that a difference is made by only one single vote this is currently debated in the answers but seems off-topic to me, ie ontopic for CV.se but offtopic for the question). Therefore I vote to close because the exact statistical question from the OP is unclear and the resulting diversity in approaches/answers is unclear and discussy. – Sextus Empiricus Mar 01 '20 at 20:04
  • 3
    @SextusEmpiricus, I disagree on closing the question. Yes, the full answer requires us to go beyond stats, however the premise of the question is based on stats. It's the observation that probability that your vote is tie-breaking is very low. Therefore, discussion of statistical aspects of the argument is relevant to this forum as long as we stick to stats and don't venture into politics of elections. – Aksakal Mar 02 '20 at 00:37
  • 1
    @Aksakal I'd agree that the question is relating to statistics and it is not for nothing that statisticians have been spending their time on this topic. However, the statistical issue is not at all clearly/explicitly stated in the question. Currently, the questions is stated like "what is wrong with the reasoning" (This makes it actually a loaded questions because 'it doesn't need to be wrong', at least not the statistical part; and thus this opens up much more off-topic discussion about the philosophical part) or like "that one vote matters" (where it is not *defined* what 'matters' means). – Sextus Empiricus Mar 02 '20 at 01:06
  • Some questions like 'what is the probability that one vote will reverse the result?' would already be more clear. But, with the current question we have people answering like *"the answer cannot be derived from pure statistics point of view in my opinion"* – Sextus Empiricus Mar 02 '20 at 01:09
  • If you don't vote and the "wrong" candidate wins (even Adolf Hitler won a democratic election) then you were part of the problem. Do you want to be part of the problem? – user253751 Mar 02 '20 at 12:42
  • 2
    While not directly related to the voting itself and difficult to get hard numbers of the effect, there could be a chain reaction / snowball effect if you tell other people you're not voting, or manage to convince them it's pointless, and this leads some of them to not vote either (and they tell others they aren't voting, which leads some of them to not vote, etc.). – NotThatGuy Mar 02 '20 at 13:35
  • 1
    "if everybody thought like that, it would change the election result" - Unless you're personally convincing people to vote or not vote, this is a fallacy. But it's not a fallacy because not everyone will think like this, but rather because whether or not you vote will not affect whether or not they vote. – NotThatGuy Mar 02 '20 at 13:47
  • @NotThatGuy Merely telling people that you do not vote because it's useless probably has that effect. – user253751 Mar 02 '20 at 15:13

7 Answers7

101

It's wrong in part because it's based on a mathematical fallacy. (It's even more wrong because it's such blatant voter-suppression propaganda, but that's not a suitable topic for discussion here.)

The implicit context is one in which an election looks like it's on the fence. One reasonable model is that there will be $n$ voters (not including you) of whom approximately $m_1\lt n/2$ will definitely vote for one candidate and approximately $m_2\approx m_1$ will vote for the other, leaving $n-(m_1+m_2)$ "undecideds" who will make up their minds on the spot randomly, as if they were flipping coins.

Most people--including those with strong mathematical backgrounds--will guess that the chance of a perfect tie in this model is astronomically small. (I have tested this assertion by actually asking undergraduate math majors.) The correct answer is surprising.

First, figure there's about a $1/2$ chance $n$ is odd, which means a tie is impossible. To account for this, we'll throw in a factor of $1/2$ in the end.

Let's consider the remaining situation where $n=2k$ is even. The chance of a tie in this model is given by the Binomial distribution as

$$\Pr(\text{Tie}) = \binom{n - m_1 - m_2}{k - m_1} 2^{m_1+m_2-n}.$$

When $m_1\approx m_2,$ let $m = (m_1+m_2)/2$ (and round it if necessary). The chances don't depend much on small deviations between the $m_i$ and $m,$ so writing $N=k-m,$ an excellent approximation of the Binomial coefficient is

$$\binom{n - m_1-m_2}{k - m_1} \approx \binom{2(k-m)}{k-m} = \binom{2N}{N} \approx \frac{2^{2N}}{\sqrt{N\pi}}.$$

The last approximation, due to Stirling's Formula, works well even when $N$ is small (larger than $10$ will do).

Putting these results together, and remembering to multiply by $1/2$ at the outset, gives a good estimate of the chance of tie as

$$\Pr(\text{Tie}) \approx \frac{1}{2\sqrt{N\pi}}.$$

In such a case, your vote will tip the election. What are the chances? In the most extreme case, imagine a direct popular vote involving, say, $10^8$ people (close to the number who vote in a US presidential election). Typically about 90% of people's minds a clearly decided, so we might take $N$ to be on the order of $10^7.$ Now

$$\frac{1}{2\sqrt{10^7\pi}} \approx 10^{-4}.$$

That is, your participation in a close election involving one hundred million people still has about a $0.01\%$ chance of changing the outcome!

In practice, most elections involve between a few dozen and a few million voters. Over this range, your chance of affecting the results (under the foregoing assumptions, of course) ranges from about $10\%$ (with just ten undecided voters) to $1\%$ (with a thousand undecided voters) to $0.1\%$ (with a hundred thousand undecided voters).

In summary, the chance that your vote swings a closely-contested election tends to be inversely proportional to the square root of the number of undecided voters. Consequently, voting is important even when the electorate is large.


The history of US state and national elections supports this analysis. Remember, for just one recent example, how the 2000 US presidential election was decided by a plurality in the state of Florida (with several million voters) that could not have exceeded a few hundred--and probably, if it had been checked more closely, would have been even narrower.

If (based on recent election outcomes) it appears there is, say, a few percent chance that an election involving a few million people will be decided by at most a few hundred votes, then the chance that the next such election is decided by just one vote (intuitively) must be at least a hundredth of one percent. That is about one-tenth of what this inverse square root law predicts. But that means the history of voting and this analysis are in good agreement, because this analysis applies only to close races--and most are not close.

For more (anecdotal) examples of this type, across the world, see the Wikipedia article on close election results. It includes a table of about 200 examples. Unfortunately, it reports the margin of victory as a proportion of the total. As we have seen, regardless of whether all (or even most) assumptions of this analysis hold, a more meaningful measure of the closeness of an election would be the margin divided by the square root of the total.


By the way, your chance of an injury due to driving to the ballot box (if you need to drive at all) can be estimated as the rate of injuries annually (about one percent) divided by the average number of trips (or distance-weighted trips) annually, which is several hundred. We obtain a number well below $0.01\%.$

Your chance of winning the lottery grand prize? Depending on the lottery, one in a million or less.

The quotation in the question is not only scurrilous, it is outright false.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 7
    This is probably the most convincing reason to vote I've seen. Now, to paraphrase it in a way that laymen could follow... – Forgottenscience Feb 28 '20 at 15:13
  • 22
    This is not a proof, that the statement in the question is wrong, since it relies on very strong assumptions: 1) $m_1 \approx m_2$, which does not apply in most cases (except maybe U.S. presidency election). 2) Undecided voters choose with uniform probability across all possible choices. This does not apply empirically. Otherwise, small parties with few supporters would receive a large amount of undecided voters. 3) There is a random component, that only regards undecided voters. The adequacy of such a model is left to be proven. – ghlavin Feb 28 '20 at 16:02
  • 28
    @ghlavin You have unrealistic standards of "proof" in this setting. The question was not asking for a mathematical "proof," because no such thing is possible. I have not elaborated on the consequences of the assumptions I had to make, anticipating that most readers will immediately see how the results change when the assumptions change--especially when there is a large imbalance between $m_1$ and $m_2.$ Those expectations were implicit in my concluding remarks pointing out that not all elections are closely contested and adducing *empirical support* for the conclusion, contrary to your claim. – whuber Feb 28 '20 at 16:09
  • 3
    @ghlavin - 1. It was not presented as a "proof". 2. As was stated in the initial sentence of the response, and repeated near the end, the implicit context of the OP was $m_1 \approx m_2$. Obviously things are different in an election where one candidate is going to get 80% of the vote! 3. Your second point seems irrelevant to the situation presented, which only has two candidates. 4. It is well-known that, in U. S. elections at any rate, most people's choices are made even before the candidate is selected, and undecided voters make up only a small percentage of the electorate. – jbowman Feb 28 '20 at 16:09
  • 2
    @Forgotten I agree with your sentiment. Anticipating it somewhat, I encapsulated the mathematical analysis within one paragraph and subsequently summarized it: first with the $1/(2\sqrt{n\pi})$ formula and later as an "inverse square root law." One could paraphrase that further still as a "diminishing returns" kind of law, but at appreciable risk of losing the insight afforded by the square root, which is what leads to the surprisingly high chances even when there are huge numbers of voters. – whuber Feb 28 '20 at 16:13
  • 2
    @whuber: The standards of "proof" are set as a consequence of the statement "It's wrong in part because it's based on a mathematical fallacy". But it is only a mathematical fallacy in a very particular situation. – ghlavin Feb 28 '20 at 16:15
  • 2
    Well, it is a mathematical fallacy in the by far most interesting situation, so your counter-claim doesn't say much. – Forgottenscience Feb 28 '20 at 16:18
  • 22
    @ghlavin I am not the one positing the fallacy! The quotation states, in an unqualified manner, that the chance of one vote tipping an election are so tiny as to be negligible, *no matter what.* Logically, it suffices to exhibit just one example where that general statement is incorrect (and there are many trivial ones, such as elections with small numbers of voters). As a matter of fairness and applicability, such an example should be realistic, plausible, and occur sufficiently often to be of general interest. I have met those standards. – whuber Feb 28 '20 at 16:19
  • 3
    @jbowman In the examples brought in the answer of whuber, you don't have to go up to 80 %, 51 % would easily be enough. – ghlavin Feb 28 '20 at 16:25
  • 2
    @whuber "As a matter of fairness and applicability, such an example should be realistic, plausible, and occur sufficiently often to be of general interest. I have met those standards" Sorry, I disagree, this is the whole point of my comments. Even if we consider US presidency election, and assume equal supporters, why only considering the votes of undecided voters as random? Why not considering all voters having a random component? – ghlavin Feb 28 '20 at 16:34
  • 4
    @ghlavin Because, as Forgottenscience has pointed out, that's not how political scientistics, pollsters, pundits, or even US voters believe it works. If you wish to challenge my approach--and it appears you are motivated to do so--then focus on the realism and interpretability of modeling the votes of undecided voters as being *random* and *independent.* Their independence is probably the most questionable assumption whose violation has the greatest effect. However, if the votes are positively correlated, that only *increases* the probability estimate. – whuber Feb 28 '20 at 16:47
  • @Aksakal In that case are you denying the applicability of the Binomial distribution or are you challenging the asymptotic behavior of the central Binomial coefficients? Remember, I'm not directly analyzing the *empirical frequencies* of close votes (which is the topic of that paper); I'm only reasoning about elections that *are expected to be close in the first place.* I am referring to empirical evidence only to demonstrate that the inverse square root law stands up to evidence. – whuber Feb 28 '20 at 19:07
  • 3
    @whuber, I think you're grossly overestimating the probability of the tie by applying a square root, while it must be proportional to the $1/n$, see Eq.3 here: nber.org/papers/w8590.pdf – Aksakal Feb 28 '20 at 19:13
  • 3
    @whuber No, they're not denying applicability Binomial, they;re applying it more carefully – Aksakal Feb 28 '20 at 19:13
  • 1
    @Aksakal As far as I can tell, they are not applying a Binomial model at all: on pp 7 and 8 they are replacing it by a *uniform distribution* within a vague neighborhood of the *mode* of the Binomial distribution! That's where they get their reciprocal relationship, but it's meaningless. They also obtain another reciprocal relationship, but it's with a different model (p. 19) in which they integrate over a broad prior distribution. Although that might be relevant to their investigation of *all* elections, it's not directly applicable to a voting decision in any *particular* contest. – whuber Feb 28 '20 at 19:21
  • 5
    (Continued) A formal way to express the distinction is this: my answer analyzes a *conditional* probability of casting a deciding vote when an election looks close; the paper analyzes the *unconditional* chance that an election is decided by at most one vote. Obviously the latter is much smaller than the former. Which is relevant is worthy of discussion, and both provide insight, but bear in mind that voter suppression efforts (as exemplified by the question's opening quotation) tend to occur more frequently in elections that are perceived to be close. – whuber Feb 28 '20 at 19:24
  • 3
    @whuber, a main weakness in your solution is that you treat $m_1$ and $m_2$ as given, while others integrate over $m_1$ and $m_2$. In reality, which 2016 demonstrated, nobody knows $m_1$ and $m_2$. So, it's more realistic to assume that maybe there's probability $\Pi$ of voting democrat, and that Binomial distribution will define the realized $m_i$ over which you further integrate. That's why you get such a high probability – Aksakal Feb 28 '20 at 19:32
  • 1
    @Aksakal I don't integrate over them because provided they are close to equal, it doesn't matter. What ultimately matters is the *range* of integration. That's worthy of further consideration, but it's not a weakness. Indeed, this helps point out what I see as a limitation of the integration approaches, which that they paper over this issue by using a prior. For the present purposes that's immaterial: it is enough to show how and why, and approximately under what circumstances, the common perception that "my vote is unlikely to count" may be egregiously wrong. – whuber Feb 28 '20 at 19:36
  • 2
    @whuber, you should look at the empirical results. clearly, your numbers are much higher than empirical estimates. I understand it's a great sentiment you're promoting, but your calculation logic is wrong. there must be something that we don't understand about rationality of voting, but it's not in calculations of the probability of a tie – Aksakal Feb 28 '20 at 19:47
  • 4
    @Aksakal AFAIK, this is not a discussion about rationality of voting: it's an effort to assess the value of votes. It's not at all clear my numbers are higher than empirical estimates, for the reasons I have outlined. I see no errors in calculation, so concern ought to focus on the reasonableness and scope of applicability of its assumptions. I think a much stronger way to criticize my analysis would be not to keep insinuating it must somehow be incorrect, but to frame it in Bayesian terms, where it is tantamount to adopting a strong prior distribution. That would be more productive. – whuber Feb 28 '20 at 20:26
  • 1
    "Typically about 90% of people's minds a clearly decided" So... you just ignore them in your math? Just because their minds are decided doesn't mean they don't affect the outcome of the election. It seems like your math done on N=10^7 doesn't actually help much when finding the chances of a tie when the actual N is 10^8. – Indigenuity Feb 28 '20 at 23:04
  • 2
    @Indigenuity I invite you to read this post more carefully. Those people are explicitly accommodated in the calculation through the variables $m_1$ and $m_2.$ – whuber Feb 28 '20 at 23:35
  • 1
    @jbowman *"the implicit context of the OP was $_1 \approx m_2$"* Where? – user76284 Feb 29 '20 at 00:26
  • @whuber Why do you say the OP's quote is a "voter suppression effort" and "blatant voter-suppression propaganda"? I don't think I've seen such a strangely heated reaction to a simple observation on Stats.SE before. – user76284 Feb 29 '20 at 01:06
  • 5
    @user76284 - 1) You're right, it was actually in the first sentence of the second paragraph of the answer. 2) In the U.S, one widely-accepted voter theory is that there are "base" voters who will come out to vote almost no matter what and others who may or may not come out to vote. If your base is larger than the other side's base, and your interest is in winning rather than in encouraging broad voter participation in the process, you have a strong incentive to discourage the others from voting, and reminding them repeatedly that their votes (individually) won't matter is one way to do so. – jbowman Feb 29 '20 at 01:06
  • 4
    @user76284 - Also, adding in "remember, you might get injured on your way to cast your useless vote!" sort of gives the game away, don't you think? – jbowman Feb 29 '20 at 01:08
  • @user76284 - our comments crossed; evidently you don't think so! Actually, I guess it would be helpful to know where the original quote came from. – jbowman Feb 29 '20 at 01:09
  • 1
    @jbowman Regarding your second point, sure. That has nothing to do with the OP's quote *being* a "voter suppression effort" and "blatant voter-suppression propaganda". That kind of presumption of malice is inflammatory and doesn't belong on a site about math like Stats.SE. Regarding the injury, the point of that is to show that the utility might be not only not positive, but negative. – user76284 Feb 29 '20 at 01:17
  • @user76284 - exactly, which is exactly what you would do if you were trying to convince people not to bother to vote. Look at the answer to the last question in this interview: https://www.theledger.com/news/20180831/low-voter-turnout-as-troubling-as-possible-hacking - which I selected because it came up fourth on my search of ""don't bother to vote" voter suppression", and it looked the most relevant. Now, maybe it is, maybe it isn't, but it sure looks like it. Knowing the source would help, though. – jbowman Feb 29 '20 at 01:17
  • @whuber I've suggested an edit that adds a graph showing the probability of a tie against the fraction of decided voters for one outcome, given the fraction of decided voters for the other outcome and the total number of voters. Let me know what you think. – user76284 Feb 29 '20 at 02:36
  • 4
    @whuber, for some reason you keep discarding all the empirical evidence and prior studies on the probabilities of decisive vote. Look at Fig 2 in Gelman et al (2008) http://www.stat.columbia.edu/~gelman/research/published/probdecisive2.pdf It's irrefutably $O(1/n)$ when the split is close to 50/50, and the numbers are consistent with what I cited in the order of $10^{-7}-10^{-8}$ in large states, much much lower than your estimate – Aksakal Feb 29 '20 at 04:46
  • @jbowman The fact that a person trying to do X might say statement Y doesn't mean every person saying statement Y is trying to do X. And to further strengthen my point, the source of the quote is irrelevant (a [red herring](https://en.wikipedia.org/wiki/Red_herring)), since we're discussing the mathematical correctness of the quote, not who said it. – user76284 Feb 29 '20 at 17:11
  • @whuber Did you reject my edit? – user76284 Feb 29 '20 at 17:59
  • 1
    @user76284 Yes, I did. That doesn't mean I disagree with it--but it was so substantial a change that I did not want to have to give it the thought and consideration needed to verify that it fit with everything else in my post and was fully correct. You are welcome to post your own answer. – whuber Feb 29 '20 at 18:17
  • @whuber Fair enough. I’ll post it as a separate answer. I think it’s only fair to tone down the claim that the quote is “scurrilous” and “outright false” given the more complete picture, though, not to mention the accusation of “blatant voter-suppression propaganda”. Don’t you agree? – user76284 Feb 29 '20 at 18:25
  • 2
    "The history of US state and national elections supports this analysis" - the analysis in this answer is an complete disagreement with empirical evidence. in fact this statement is absolutely baseless. contrary to this answer, the empirical evidence points to significantly lower probabilities. @whuber, I think you need to revise your answer in the face of the studies and evidence that I presented. – Aksakal Feb 29 '20 at 20:44
  • 2
    @user76284 I take your comments seriously and will consider how to tone down those claims. I think it's plain, though, that the original quotation is a voter-suppression effort: that's *prima facie obvious.* – whuber Feb 29 '20 at 21:19
  • @Aksakal Please consider the possibility that you misinterpret the empirical evidence and/or my analysis. The evidence you cite concerns the frequency of "pivot elections" among *all* elections, not those that are perceived beforehand to be potentially close. It is not directly relevant except insofar as it provides empirical *lower bounds* on my estimates--and I suspect those lower bounds would be an order magnitude too low, given that closely contested elections are relatively rare. – whuber Feb 29 '20 at 21:23
  • *a direct popular vote involving, say, 10⁸ people* — does such a vote actually exist? US Presidents are elected by electoral college so only the number of voters per electoral college unit (state) matters, US Congress has districts, proportional systems have "votes per seat" much smaller than 10⁸, and many elections are local or regional. Seems like the typical order of magnitude should be closer to 10⁶ than 10⁸, except for referendums in large countries. – gerrit Feb 29 '20 at 23:02
  • 1
    @Forgottenscience "Now, to paraphrase it in a way that laymen could follow" The intuition is that the binomial distribution has standard deviation ~1/sqrt(n), so if we crudely approximate it as a uniform distribution with similar width, we get ~1/sqrt(n) as the tie probability. In less math-y terms, this is saying that ties are more likely than 1/N because the result distribution is focused on an area much smaller than N. (For the purpose of this comment, I'm neglecting real life considerations that others have pointed out.) – blah Feb 29 '20 at 23:36
  • 1
    @gerrit, in multiple links that I gave in my answers and comments, electoral college was taken into account. the probability is still extremely small to the degree where it begs a question: why vote at all? election modeling is a complicated subject, not as simple as the binomial model presented here. also, I personally always vote in every election regardless of what theory suggests as a matter of principle – Aksakal Mar 01 '20 at 00:48
  • 4
    Given the strong assumptions you are making in your answer, I don't think you have enough evidence to conclude: "The quotation in the question is not only scurrilous, it is outright false." – Akavall Mar 01 '20 at 19:55
  • 2
    _"When $m1 \approx m2$, let $m=(m1+m2)/2$ (and round it if necessary). The chances don't depend much on small deviations between the $m_i$ and $m$,"_ This is misleading. Consider your example with $m=4.5 \cdot 10^7$. A 0.1 % error ($m_1 = 1.001m$, $m_2=0.999m$) would mean that instead of $5'000'000$ out of $10^7$, we're now requiring $5'045'000$ out of $10^7$. That's **28.5 standard deviations** off the centre, since the standard deviation of $\mathrm{Bin}(10^7,1/2)$ is $1581$. – JiK Mar 01 '20 at 21:04
  • 1
    In other words, this answer seems to rely on the approximation that $\binom{2N}{K}$ is close to $\binom{2N}{N}$ if $K$ is close to $N$. That is not so simple when $N$ is big. – JiK Mar 01 '20 at 21:11
  • So if you are living in Washington D.C. then the statement about being more likely to get in a car accident *is* true. – Sextus Empiricus Mar 01 '20 at 21:20
  • 1
    Even after careful analysis, I would confidently also answer your test question in the third paragraph, since I would wrongly interpret $m_1 = 4.5045 \cdot 10^7$ and $m_2 = 4.4955 \cdot 10^7$ possible if you say "$m_1 \approx m_2$". You need to be much more explicit in listing the assumptions in your model if you wish to claim people making that interpretation are wrong. – JiK Mar 01 '20 at 21:29
  • 1
    @whuber Even in comments you are saying _"I don't integrate over [$m_1$ and $m_2$] because provided they are close to equal, it doesn't matter"_. I don't think you're being honest in just how close you are assuming them to be. – JiK Mar 01 '20 at 21:34
  • 2
    The [history of the U.S. presidential elections](https://web.archive.org/web/20120825102042/http://www.mit.edu/~mi22295/elections.html#ranking) shows that 2000 was actually the smallest difference of all. This difference could only be made by the voters in Florida. Only *once* before was a state in the situation that they were a decisive state *and* the difference in votes was less than 1000 (South Carolina 1876). – Sextus Empiricus Mar 01 '20 at 21:41
  • 2
    Note that I am not questioning whether the idea or the result of this analysis is correct; I'm simply pointing out that this answer, as it currently stands, is unclear in what the mathematical assumptions in this model are and how the result mathematically follows from the mathematical assumptions. – JiK Mar 01 '20 at 21:45
  • @whuber Why have you involved PI in your answer? Can you please compare your formula to "empirical" data in my answer? Your formula seems to work without PI. – Przemyslaw Remin Mar 02 '20 at 16:22
  • The factor of $1/\sqrt{\pi}$ is correct but unnecessary because the purpose of the formula is only to suggest how a probability varies with $N.$ We shouldn't care (much) about getting an absolutely correct constant, provided only it is roughly the right size. In general, the logic of my analysis dictates that whenever making approximations I should choose to *underestimate* the probability; thus I kept an initial factor of $1/2$ (although, as has been pointed out in a comment above, it's not necessary) and I kept the factor of $1/\sqrt{\pi}.$ – whuber Mar 02 '20 at 16:29
  • 2
    I think one important, unstated assumption in the answer is that $|m_1 - m_2| \ll n - m_1 - m_2$. It was stated as $m_1 \approx m_2$, but that's not the same. It's possible to have $m_1 \approx m_2$, but if $n - m_1 - m_2$ is several orders of magnitude lower than either $m_1$ or $m_2$, then the former inequality may still not hold. But that assumption is necessary for one of your steps. Intuitively stated, the assumption is that the "undecideds" is much larger than the difference between the "decideds". – Bridgeburners Mar 02 '20 at 17:55
  • 1
    @Forgottenscience: I can give it to you in plain English: *the closer an election is, the more important your vote becomes.* And you cannot know how close the election is going to be beforehand. – Robert Harvey Mar 02 '20 at 19:51
  • 1
    @Bridgeburners Even more, we need $|m_1 - m_2| / \sqrt{n - m_1 - m_2}$ to be small (around $1$ is good enough but not $10$). – JiK Mar 02 '20 at 20:05
33

I must disappoint you: current economic theory cannot explain why people keep showing up in elections, because it appears to be irrational. See a survey of literature on this subject on pages 16-35 of Geys, Benny (2006) - "‘Rational’ Theories of Voter Turnout: A Review". The voter turnout is a percentage of voters that showed up at the poll of a total voting eligible pool. In layman's words it appears that indeed your vote won't make a difference.

As in @whuber answer the analysis is closely related to the probability of casting a pivotal vote, i.e. making or breaking a tie. However, I think @whuber is making the question look simpler than it is, and also suggesting much higher probability of pivotal vote than US and European election data analysis suggests. A voter turnout is a paradox indeed. It must be zero according to theory, yet it's in close to 50% range in USA.

The answer cannot be derived from pure statistics point of view in my opinion. It belongs to behavioral aspects of human actions, which rational choice models explore, albeit in unsatisfactory way because people keep voting while the theory says they shouldn't.

Instrumental Voting

The instrumental voting approach that I mentioned earlier (see earlier reference) is the idea that your vote becomes tie breaking, and thus deices whether you gain benefits from electinng your favorite candidate. It is described with an equation for the expected utility R: $$R=PB-C>0$$ Here, P is the probability your vote is tie breaking, B benefits you get from you candidate and C associated with voting. The costs C vary and are split into roughly two categories: research of candidates and things dealing with voter registration, driving to polling stations etc. People looked at these components and came to conclusion that P is so low that any positive cost C outweighs the product PB.

Probability P has been considered by many researchers, e,g, see the authorative treatment by Gelman here: Gelman, A., King, G. and Boscardin, J. W. (1998) ‘Estimating the Probability of Events That Have Never Occurred:When Is Your Vote Decisive?'

You can find a calculation similar to the setup in @whuber's answer here in NBER paper: THE EMPIRICAL FREQUENCY OF A PIVOTAL VOTE, Casey B. Mulligan, Charles G. Hunter. Note, that this is the empirical research of voting bulletins. However, they have the independent binomial voter setup in theoretical part, see Eq.3. Their estimate is drastically different from @whuber, who came up with $\sim 1/\sqrt{n}$ while this paper derives $P=O(\frac 1 n)$, which renders very low probabilities. The treatment of probabilities is very interesting, and takes into account many non obvious considerations such whether a voter realizes what are the tie probabilities or not

A simple intuitive explanation follows, from Edlin, Aaron, Andrew Gelman, and Noah Kaplan. "Voting as a rational choice: Why and how people vote to improve the well-being of others." Rationality and society 19.3 (2007): 293-314.

Let f(d) be the predictive or forecast uncertainty distribution of the vote differential d (the difference in the vote proportions received by the two leading candidates). If n is not tiny, f(d) can be written, in practice, as a continuous distribution (e.g., a normal distribution with mean 0.04 and standard deviation 0.03). The probability of a decisive vote is then half the probability that a single vote can make or break an exact tie, or f(0)/n.

The assumption here is that an exact tie vote will be decided by a coin flip.

Empirical results

Empirical results suggest that for 20,000 voters, the probability of a tie is $\frac 1 {6000}$, which is significantly lower than @whuber's model results $\frac 1 {2\sqrt{20000\pi}}=\frac 1 {500}$

enter image description here

Another empirical study is Gelman, Andrew, Katz, Jonathan and Bafumi, Joseph, (2004), Standard Voting Power Indexes Do Not Work: An Empirical Analysis, British Journal of Political Science, 34, issue 4, p. 657-674. Its main conclusion was first cited in @user76284's answer.

Authors show that $O(1/\sqrt{n}$ doesn't fit the reality. They analyzed a massive amount of electoral data, election held on many different levels in USA and outside.

For instance, here's the plot from US presidential elections, 1960-2000, state vote data. They show the square root n fit vs. lowes (non-parametric) fit. It's clear that square root doesn't fit the data.

enter image description here

Here's another plot which also includes European election data. Again square root of n relation doesn't fit the data.

enter image description here

Section 2.2.2 in the paper explains the basic underlying assumption of square root result, which helps understand @whuber's approach. Section 5.1 has theoretical discussion.

bruno
  • 103
  • 1
Aksakal
  • 55,939
  • 5
  • 90
  • 176
  • 11
    Please don't read overmuch into my reply: at no point do I claim or even insinuate that the simple model I have erected is how "it is." My analysis is offered to show how certain reasonable assumptions about the qualitative characteristics of an election lead to answers diametrically opposite the claims in the original quotation. That's all. The most important aspect of the analysis, in my view, is in laying out a clear set of assumptions that can be evaluated, criticized, and compared to what one believes about any particular election. – whuber Feb 28 '20 at 16:48
  • See the comment thread beneath my answer for an explanation of the different probabilities: they assess completely different things. It's not a matter of "non obvious considerations" or "more careful analysis." And of course, one must make some allowance for the difference in space allowed by a white paper and a post on CV! Here, much must be left for interested readers to think through. – whuber Feb 28 '20 at 19:26
  • 1
    @whuber, "Under a coin-flipping model of voting, the probability of decisiveness is proportional to $1/\sqrt{n}$, but this model once again implies elections that are much closer than actually occur (see Mulligan and Hunter, 2002, and Gelman, Katz, and Bafumi, 2004)." from Gelman et al, https://www.nber.org/papers/w13562.pdf – Aksakal Feb 28 '20 at 20:15
  • 3
    Please see my comment elsewhere in this thread about the distinction between probability and conditional probability in this application. – whuber Feb 28 '20 at 23:40
  • 3
    It seems odd to claim that "current economic theory [...] cannot explain why people keep showing up in elections, because it appears to be irrational" while also citing (Edlin, Gelman, and Kaplan, 2007). That paper finds that voting can be rational for social voters. What am I missing? – Aaron Novstrup Feb 29 '20 at 01:59
  • @AaronNovstrup, economic theory - there are many papers on the subject, there's no consensus. Social voting is one such hypothesis that sounds reasonable, but not generally accepted – Aksakal Feb 29 '20 at 04:33
  • @Aksakal I think [my answer](https://stats.stackexchange.com/a/452028/82547) may explain the discrepancy of whuber's answer with empirical results. – user76284 Feb 29 '20 at 21:48
  • @user76284, at this point it's pretty clear where is the discrepancy from. whuber follows a so called "standard voting" power approach, which is described in sufficient detail in section 2.2.2 in Gelman, Andrew, Katz, Jonathan and Bafumi, Joseph, (2004), Standard Voting Power Indexes Do Not Work: An Empirical Analysis, British Journal of Political Science, 34, issue 4, p. 657-674, see http://www.stat.columbia.edu/~gelman/research/published/banzhaf_bjps_final.pdf If you read the paper it clearly shows that empirical evidence contradicts this result, which is not based on reality, a pure theory – Aksakal Feb 29 '20 at 22:11
  • @Aksakal That's one of the references I added to my answer :-) – user76284 Feb 29 '20 at 22:11
  • @user76284, oh, we're citing the same paper! In the paper they also hypotheise that correlation could be a reason for why square root n doesn't work. thanks for pointing out, I'll reference your answer in mine – Aksakal Feb 29 '20 at 22:23
  • 4
    No reason to multiply by 1/2. Even with an odd number of people, you can still influence the election. You can change the election from having a winner to being a tie, which sounds pretty significant to me. – David Schwartz Mar 01 '20 at 00:28
  • 1
    *"And please don't read overmuch into my reply: at no point do I claim or even insinuate that the simple model I have erected is how "it is.""* @whuber You do make the very strong statements that the question is based on a mathematical **falacy** and **outright false**. The implicit context is that your reply is instead how "it is", or at least a better picture of the probability for a tie result in the votations. – Sextus Empiricus Mar 01 '20 at 21:07
16

I'm going to take a different tack than the other answers, and argue both sides of the question.

First, let's show that voting is a pointless waste of time.

The function of an election is to derive a single outcome, called "the will of the electorate", from many samples of the individual wills of individual electors. Presumably that number of electors is large; we're not concerned here with cases of dozens or hundreds of electors.

When deciding whether you should vote, there are two possibilities. Either, as you note, there is a strong preference -- say, 51% or better -- in the electorate for one outcome. In such a scenario the probability that you will cast the "deciding" vote is minuscule, and so no matter which side of the issue you are on, you're better off staying home and not entailing all the costs of voting.

Now suppose the other possibility: the electorate is so narrowly divided that even a small number of voters choosing to vote or not vote could completely change the outcome. But in this scenario, there is no "will of the electorate" at all! In this scenario you might as well call off the election and flip a coin, saving the expense of the election entirely.


It seems like on rational grounds there is no reason to vote. Suppose a large fraction of the electorate reasons this way -- and, why shouldn't they? I live in the 43rd district of Washington State, one of the most "blue" districts in the United States. No matter which candidate I support in the district election, I can tell you right now what the party affiliation of the winner will be in my district, so why should I vote?

The reason to vote is to consider the strategic consequences of "a large fraction of the electorate considers it pointless and does not vote" upon small groups of ideologues. This attitude hands power to comparatively small, well-organized blocs who may show up en masse when not expected; if the number of voters is greatly reduced by a large fraction "rationally" deciding to stay home and not vote, then the size of a bloc required to swing an election against the clear will of the majority is greatly reduced.

Voting when "not rationally necessary" decreases the probability that an effort to swing the election by a relatively small group will succeed, and thereby increases the probability that the actual will of the majority can be determined.

Eric Lippert
  • 409
  • 2
  • 6
  • 2
    This is the most insightful answer. The fallacy of the original question is the binary/thresholded outcome of a single election, which means the marginal effects of one vote are, nearly almost always zero. But in the context of preserving the validity of the electoral process also in future elections, the marginal utility of one vote might be tiny but is present, and therefore the rational choice is to vote (at least for 'moral' individuals) – frederik Mar 01 '20 at 09:28
  • @frederik, careful rational choice studies contradict your statement. do you have anything to prove your point beyond the short statement? the original question raises a valid question. rational choice appears to point to irrationality of particiaption in election, yet people regularly turn out in droves. this is a legitimate paradox. irrationality stems from extremely low probability that your vote matters. if you remove all the sentiment and focus on science then it becomes an interesting area of study – Aksakal Mar 01 '20 at 15:24
  • 2
    @Aksakal: There are many interesting questions one can pose about rationality of electors. The latter part of my answer hints at a sort of Kantian approach; should a rational person assume that if they arrive at a conclusion that is rational, that *everyone else* will also arrive at that conclusion? – Eric Lippert Mar 01 '20 at 16:25
  • 2
    @Aksakal It really depends on the framing. If you only consider a single election then I agree that the other answers are more pertinent. But if you consider the effect on future elections, then the argument Eric Lippert makes becomes important. There is now a marginal, if very tiny effect of even one vote, as not only the outcome matters, but also the percentage gained. There is still a 'free-rider' problem: as the effect is tiny, a selfish rational individual can still get away with non-voting. But the 'moral' choice to vote is not irrational based on the Kantian argument given by Eric. – frederik Mar 01 '20 at 20:29
  • I haven't read any Kant since "communist philosophy" course that we had to take in uni, don't remember a thing from it except that communists had a strange affection to German philosophy – Aksakal Mar 02 '20 at 00:50
  • @Aksakal it means in short that we do not make choices solely based on a utilitarian weighing of pros and cons (as far as this can be even computed), but also follow a path because of good intentions related to that path itselve and not merely because of the goals/results that it will lead to. Many voters go vote because they simply believe it is good to vote and they are not making these calculations about the statistical probability that ones vote is actually gonna matter or not. – Sextus Empiricus Mar 02 '20 at 01:25
  • But instead of that Kantian moral, you might more easily consider it as ordinary animal behavior where humans are simply following the group. We vote because the others are voting. We want 'our' side to win and do not care about personal loss related to our efforts (In the case of voting these losses are actually very small. So that opens up a possibility of an independent utilitarian explanation where it might be explained with an asymmetric valuation of losses and gains; in the same way, there are many other illogical behaviors like buying lottery tickets). – Sextus Empiricus Mar 02 '20 at 01:38
  • 2
    @SextusEmpiricus: As a math major, I used to have the standard attitude that buying lottery tickets is "a tax on the innumerate" and so on. But then I got to thinking: can you name any other activity that could result in your children affording to go to a top college if you are in the bottom 10% of wealth and income in America that costs a dollar? I can't. A tiny chance at success vs the hopelessly stacked deck that is poverty in America seems like not such a bad choice. **If you think people act against their interests, maybe you haven't understood their interests.** – Eric Lippert Mar 02 '20 at 01:45
  • 1
    Personally I am not at all against buying lottery tickets. I consider this not as a fault of misunderstanding interests but as applying a simplistic expected utility computation instead of applying prospect theory. Buying lottery tickets can be understood, also from an utilitarian point of view (that's why I mentioned lotteries, it is an *example* how we can understand "illogical" behaviour). – Sextus Empiricus Mar 02 '20 at 01:51
  • 1
    I vote because my vote is counted. I grew up in a country where they’d throw out my vote if it wasn’t for communists. It’s a good feeling to cast a vote that is not thrown out so I do it every time. It’s totally irrational – Aksakal Mar 02 '20 at 03:06
  • @Aksakal Kantian imperative states that we should behave in such a way that if our behaviour were a universal law (everybody did this) it would lead to an acceptable outcome. Whereas like this it is phrased as a moral law I believe it can be cloaked as guiding rational choice, as long as you accept that your mind is an even imperfect model for the minds of other people. – frederik Mar 02 '20 at 13:45
  • 1
    @Aksakal Roughly this train of thought "I want to vote for party A, they will implement the more rational policies. But wait, my vote will not make a difference (see argument OP), let's stay at home. Hmm, other rational people like me (and who would vote for A) will think like me and also stay at home, so party B, which has many supporters incapable of thinking through things rationally, will win. I better go and vote after all; other rational people will come to the same conclusion and A will win". So I don't think your vote is irrational after all. – frederik Mar 02 '20 at 13:48
  • 1
    @frederik you make it sound like game theory stuff, a la Nash equilibrium – Aksakal Mar 02 '20 at 13:58
  • @frederik: That's almost exactly what I was getting at but you explained it much more clearly. Though I was thinking more like "small minority party Z will realize that the election is vulnerable thanks to A and B supporters staying home and will rationally make a concerted effort to take advantage of the situation". – Eric Lippert Mar 02 '20 at 16:11
14

The analysis presented in whuber's answer reflects the Penrose square root law, which states that, under certain assumptions, the probability that a given vote is decisive scales like $1/\sqrt{N}$. The assumptions underlying that analysis, however, are too strong to be realistic in most real-world scenarios. In particular, it assumes that the fractions of decided voters for each outcome are virtually identical, as we'll see below.

Below is a graph showing the probability of a tie against the fraction of decided voters for one outcome, given the fraction of decided voters for the other outcome (assuming the rest vote uniformly at random) and the total number of voters:

enter image description here

The Mathematica code used to create the graph was

fractionYes = 0.45;
total = 1000000;
Plot[
 With[
  {
   y = Round[fractionYes*total],
   n = Round[fractionNo*total],
   u = Round[(1 - fractionYes - fractionNo)*total]
   },
  NProbability[y + yu == n + u - yu, 
   yu \[Distributed] BinomialDistribution[u, 1/2]]
  ],
 {fractionNo, 0, 1 - fractionYes},
 AxesLabel -> {"fraction decided no", "probability of tie"},
 PlotLabel -> 
  StringForm["total = ``, fraction decided yes = ``", total, 
   fractionYes],
 PlotRange -> All,
 ImageSize -> Large
 ]

As the graph shows, whuber's analysis (like the Penrose square root law) is a knife-edge phenomenon: In the limit of growing population size, it requires the fractions of decided voters for each outcome to be exactly equal. Even tiny deviations from this assumption make the probability of a tie very close to zero.

This might explain its discrepancy with the empirical results presented in Aksakal's answer. For example, Standard voting power indexes do not work: An empirical analysis (Cambridge University Press, 2004) by Gelman, Katz, and Bafumi says:

Voting power indexes such as that of Banzhaf are derived, explicitly or implicitly, from the assumption that all votes are equally likely (i.e., random voting). That assumption implies that the probability of a vote being decisive in a jurisdiction with $n$ voters is proportional to $1/\sqrt{n}$. In this article the authors show how this hypothesis has been empirically tested and rejected using data from various US and European elections. They find that the probability of a decisive vote is approximately proportional to $1/n$. The random voting model (and, more generally, the square-root rule) overestimates the probability of close elections in larger jurisdictions. As a result, classical voting power indexes make voters in large jurisdictions appear more powerful than they really are. The most important political implication of their result is that proportionally weighted voting systems (that is, each jurisdiction gets a number of votes proportional to $n$) are basically fair. This contradicts the claim in the voting power literature that weights should be approximately proportional to $\sqrt{n}$.

See also Why the square-root rule for vote allocation is a bad idea by Gelman.

user76284
  • 791
  • 3
  • 20
  • 2
    Another way to demonstrate the discrepancy is by plotting the middle term of a binomial distribution and the middle term of a betabinomial distribution as a function of $n$. For the betabinomial the probability will be lower and it will in general shift from a $\frac{1}{\sqrt{n}}$ dependency to a $\frac{1}{n}$ dependency (like [this image](https://i.stack.imgur.com/X89l1.png)). The binomial distribution sounds like a nice model but there is no good reason to choose it over a betabinomial distribution when the emperical analysis does not support it. – Sextus Empiricus Mar 01 '20 at 21:01
4

It is easy to construct situations, where voting matters, e.g. the population consists of 3 people (including myself), one votes red, one votes blue, then clearly my vote matters.

Of course in your quote, not such trivial quotes are meant, but real-life situations with maybe millions of voters.

So let us extend my trivial example:

Let $X=1$ indicates, if the count of every other voter results in a tie (thus $X=0$ means no tie ).

$Y=1$ indicates, if my vote "matters". My vote only matters all the other votes result in a tie. Otherwise it does not matter.

Therefore $P\left(Y=1 \vert X = 1\right) = 1$ and $P\left(Y=1 \vert X = 0\right) = 0$.

This means, there is no universal answer. If your vote "matters", completely depends on the actions of all other voters.

Your question is already solved (with the answer: it depends how the others act), but you can ask follow-up questions: Across different elections, how often does my vote matter on average?

Or in mathematical terms: $P\left(Y=1 \right) = ?$

$P\left(Y=1 \right) = P\left(Y=1 \vert X = 1\right) P\left( X = 1\right) + P\left(Y=1 \vert X = 0\right) P\left( X = 0\right) = P\left( X= 1\right)$.

$P\left( X= 1\right)$ depends on the election and the situation, which I denote as $\theta$: $P\left( X= 1\right) = \int P\left( X= 1 \vert \Theta = \theta \right) f \left(\theta\right)\,d\theta$, where $f$ is the sampling distribution of the election. Realistically, for the overwhelming majority of $\theta$, $P\left( X= 1 \vert \Theta = \theta \right)$ will be very close to zero.

Now comes my critique to whuber's solution: $f$ represents the votes, you might participate in your whole lifetime. It will include elections on different candidates, different years different topics and so on. This variability is underrepresented in whuber's solution because it implicitely assumes, there are only elections with a supporters tie (meaning $f$ is a point mass on an unbelievebly improbable event) and $P\left( X= 1 \vert \theta \right)$ is simply a binomial probability of a tie from voters, that are undecided.

$f$ should reflect the whole election variability. To say it is deterministic at the particular situation of equality between the parties is clearly an under-complex representation of reality, and even in this artificial case the probability is $\frac{1}{10000}$. If I vote 10 times in a lifetime, I need 1000 lifes, that finally my vote matters.

PS: I strongly believe, that voting matters, but not in a statistically describable way. It is a different discussions on a philosophical topic, not a statistical one.

ghlavin
  • 457
  • 2
  • 6
  • 4
    I don't believe your critique of whuber's answer is informed. It isn't that their answer under-represents the variability, it is that it **explicitely** excludes the trivial case of lopsided elections (X=0 and Y=0). In contrast, when an election is close, by definition m1 and m2 are close. Which type of election you are participating in determines the answer to the OP's question - not an amalgamation of all elections one participates in. If it is a well known or high probability outcome election - your vote probably doesn't matter. If it is unknown, it *can* matter. – CramerTV Feb 28 '20 at 23:43
  • It does not exclude X=0 and Y=0, in fact these are extremly probable result, even in the artificial scenario in whuber's solution. With your last statement I agreed, as I wrote"there is no universal answer. If your vote "matters", completely depends on the actions of all other voters." – ghlavin Feb 29 '20 at 00:05
  • @CramerTV The "closeness" required is a lot stronger than you might think. See [my answer](https://stats.stackexchange.com/a/452028/82547). – user76284 Feb 29 '20 at 22:13
  • Not many elections depend on millions of voters. In district-based systems, most districts for most elections have far fewer voters. In proportional systems, the number of votes per seat also tends to be much smaller than a million. – gerrit Feb 29 '20 at 22:55
4

You can consider the probability that the voting result is a tie when there are an even number of total voters (in which case the vote of an individual matters). We consider for simplicity even values of $n$ but this can be extended to odd values of $n$.


Assumption case 1

Let's consider the vote $X_i$ of each voter $i$ as a Bernoulli distributed variable (where $X_i$ is either $1$ or $-1$):

$$P(X_i = x_i) \begin{cases} p & \quad \text{if $x_i = -1$}\\ 1-p & \quad\text{if $x_i = 1$} \end{cases}$$

and the sum for $n$ people, $Y = \sum_{i=1}^n X_i$, relates to the election result. Note that $Y=0$ means that the result is a tie (the same amount of +1 and -1 votes).

Approximate solution case 1

This sum can be approximated with a normal distribution:

$ P(Y_n = y) \to \frac{1}{\sqrt{n}} \frac{1}{\sqrt{2 \pi p (1-p) }} e^{-\frac{1}{2} \frac{(y-(p-0.5)n)^2}{p(1-p)n}}$

and the probability for a tie is:

$P(Y_n = 0) \to \frac{1}{\sqrt{n}} \frac{1}{\sqrt{2 \pi p (1-p) }} e^{-\frac{1}{2} \frac{(p-0.5))^2}{p(1-p)}n}$

This simplifies for $p=0.5$ to the results shown in other answers (the exponential term will be equal to one):

$ P(Y_n = 0 \vert p = 0.5) \to \sqrt{\frac{2}{n\pi}} $

But for other probabilities, $p \neq 0.5$ the function will behave similar to a function like $\frac{e^{-x}}{\sqrt{x}}$ and the drop due to the exponential term will become dominant at some point.


Assumption case 2

You can also consider a problem like case 1 but now the probability for the votes $X_i$ is not a constant value $p$ but it is itself some variable drawn from a distribution (this expresses sort of mathematically that the random vote for each voter is not fifty-fifty each election and we do not really know what it is, hence we model $p$ as a variable).

Let's for simplicity say that $p$ follows some distribution $f(p)$ between 0 and 1. For each election the odds will be different for a candidate.

What is happening here is that with growing $n$ the random behaviour of the different $X_i$ will even out and the distribution of the sum $Y_n$ will be more and more resembling the distribution of the value $p$.

$\begin{array}{} P(Y_n = y) \to P(\frac{y+n-1}{2n} < p < \frac{y+n+1}{2n}) &=& \int_{\frac{y+n-1}{2n}}^{\frac{y+n+1}{2n}} f(p) dp \\ &\approx& f(\frac{y+n}{2n}) \frac{1}{n} \end{array}$

and for the probability of a tie you get

$P(Y_n=0) \to \frac{f(0.5)}{n}$

this expresses better the experimental results and the $\frac{1}{n}$ relationship that Aksakal mentions in his answer.

So, this relationship $\frac{1}{n}$ does not stem from the randomness in the Binomial distribution and the probabilities that the different voters $X_i$, who are considered behaving randomly, sum up to a tie. But instead it is derived from the distribution in the parameter $p$ which describes the voting behavior from election to election, and the $\frac{1}{n}$ term is derived from the probability, $0.5 - \frac{1}{2n} < p < 0.5 + \frac{1}{2n}$, that $p$ is very close to fifty-fifty.

Example plot

The different cases are plotted in the graph below. For the case 1 there is a variation depending on whether $p=0.5$ or $p\neq 0.5$. In the example we plotted $p=0.52$ along with $p=0.5$. You can see that this already makes a large difference.

You could say that for a $p \neq 0.5$ the probability that the vote matters is very tiny and drops dramatically for already $n>100$. In the plot you see the example with $p=0.52$. However, it is not realistic that this probability is fixed. Consider for instance swing states in the US presidential elections. From year to year you see a variation in the tendencies how states vote. That variation is not due to the random behaviour of the $X_i$ according to some Bernoulli distribution, but instead it is due to the random behaviour of $p$ (ie. the changes in the political climate). In the plot you can see what would happen for a beta-binomial distributed variable where the mean of $p$ is equal to 0.52. Now you can see that, for higher values of $n$, the probability for a tie is a bit higher. Also the actual value of the mean of $p$ is not so much important, but instead much more important is how much it is dispersed.

example

R-Code to replicate the image:

p = 0.52
q = 1-p

## compute probability of a tie
n  <- 2 ^ c(1:16)
y  <- dbinom(n/2,n,0.5)
y2 <- dbinom(n/2,n,p)
y3 <- dbetabinom(n/2,n,0.5,1000)
y4 <- dbetabinom(n/2,n,0.52,1000)

# plotting
plot(n,y, ylim = c(0.0001,1), xlim=c(1,max(n)), log = "xy", yaxt="n", xaxt = "n",
     ylab = bquote(P(X[n]==0)),cex.lab=0.9,cex.axis=0.7, 
     cex=0.8)
axis(1      ,c(1,10,100,1000,10000),cex.axis=0.7)
axis(2,las=2,c(1,0.1,0.01,0.001),cex.axis=0.7)
points(n,y2, col=2,  cex = 0.8)
points(n,y3, col=1, pch=2, cex = 0.8)
points(n,y4, col=2, pch=2, cex = 0.8)

x <- seq(1,max(n),1)


## compare with estimates


# binomial distribution with equal probability
lines(x,sqrt(2/pi/x) ,col=1,lty=2)

# binomial distribution with probability p
lines(x,1/sqrt(2*pi*p*q)/sqrt(x) * exp(-0.5*(p-0.5)^2/(p*q)*x),col=2,lty=2)

# betabinomial distribution with dispersion parameter 1000
lines(x, dbeta(0.5,0.5*1000,0.5*1000)/x ,col=1)


# betabinomial distribution with dispersion parameter 1000
lines(x, dbeta(0.5,0.52*1000,0.48*1000)/x ,col=2)


legend(1,10^-2, c("p=0.5", "p=0.52", "betabinomial with mu=0.5",  "betabinomial with mu=0.52"), col=c(1,2,1,2), lty=c(2,2,1,1), pch=c(1,1,2,2),
       box.col=0, cex= 0.7)

Assumption case 3

A different way to look at it is to consider that you have two pools of voters (with fixed or variable size) out of which the voters randomly decide to show up for the election or not. Then the difference of these two variables is a binomial distributed variable and you can handle the situation like the problems above. You get something like case 1 if the probabilities to show up are considered fixed and you get something like case 2 if the probabilities to show up are not fixed. The expression will be a bit more difficult now (the difference between two binomial distributed variables is not easy to express) but you could use the normal approximation to solve this.

Assumption case 4

You consider the case that the number of voters is not known ("unknown number of voters"). If this is relevant then you could integrate/average the above solutions over some distribution of the number of voters that are expected. If this distribution is narrow then the result will not be much different.

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
  • Small sidenote: here I derived the $\frac{1}{\sqrt{n}}$ relationship using the normal distribution approximation to a binomial distributed variable. Under the hood this relates to the use of Stirling's formula in whuber's answer (because the normal distribution approximation can be derived using Stirling's formula). – Sextus Empiricus Mar 05 '20 at 14:44
  • @Aksakal I am portraying *both* situations $\frac{1}{\sqrt{n}} \exp(-an)$ and $\frac{1}{n}$. In the second case I show why you get $\frac{1}{n}$. The $\frac{1}{\sqrt{n}}$ is not due to 'the vote is close' but due to 'the vote probability is fixed'. Note that I also plotted the result for a betabinomial distribution where the mean of the results is fifty-fifty. The deviation from $\frac{1}{\sqrt{n}}$ and change into $\frac{1}{n}$ is due to the dispersion. – Sextus Empiricus Mar 05 '20 at 15:37
  • 1
    Ok, I removed my comment, thanks for clarification. we do not know $p$ in advance, and it varies as empirical results show. your demonstration explains this part well. it's the best direct demo of what's wrong with @whuber answer. I hoped that OP would figure it too with his simulation iterations, but he went wayward at some point – Aksakal Mar 05 '20 at 15:42
2

A simple model. New captain has to be chosen on a ship. There are 6 voters. Two candidates agreed to compete for the office - audacious Mr. Zero and brilliant Mr. One. Nobody on the deck is obliged to vote. We don't know how many voters will take part in the election.

Simulation

  • The number of voters participating in the voting will be indicated by the dice roll {1,2,3,4,5,6}
  • The choice of candidate by each voter will be indicated by a coin flip {0,1}

The strong decisive vote is that our candidate receives one more vote from a competitor - this is only possible if an odd number of voters take part in the election.

The weak decisive vote is that our candidate receives one more vote (odd number of voters) or leads to a tie (even number of voters).

We calculate decisive vote in favor of Mr. One. So we have the following potential events.

+-----+------+----------+------------+---------+---------+------------+------------+
|     | sub  | election |   number   |  votes  |  votes  |   strong   |    week    |
| #   | case |  result  | of voters  |  for 1  |  for 0  |  decisive  |  decisive  |
+-----+------+----------+------------+---------+---------+------------+------------+
+-----+------+----------+------------+---------+---------+------------+------------+
| 1   | 1    | 0        | 1          | 0       | 1       | 0          | 0          |
| 2   | 2    | 1        | 1          | 1       | 0       | 1          | 1          |
+-----+------+----------+------------+---------+---------+------------+------------+
| 3   | 1    | 00       | 2          | 0       | 2       | 0          | 0          |
| 4   | 2    | 01       | 2          | 1       | 1       | 0          | 1          |
| 5   | 3    | 10       | 2          | 1       | 1       | 0          | 1          |
| 6   | 4    | 11       | 2          | 2       | 0       | 0          | 0          |
+-----+------+----------+------------+---------+---------+------------+------------+
| 7   | 1    | 000      | 3          | 0       | 3       | 0          | 0          |
| 8   | 2    | 001      | 3          | 1       | 2       | 0          | 0          |
| 9   | 3    | 010      | 3          | 1       | 2       | 0          | 0          |
| 10  | 4    | 011      | 3          | 2       | 1       | 1          | 1          |
| 11  | 5    | 100      | 3          | 1       | 2       | 0          | 0          |
| 12  | 6    | 101      | 3          | 2       | 1       | 1          | 1          |
| 13  | 7    | 110      | 3          | 2       | 1       | 1          | 1          |
| 14  | 8    | 111      | 3          | 3       | 0       | 0          | 0          |
+-----+------+----------+------------+---------+---------+------------+------------+
| 15  | 1    | 0000     | 4          | 0       | 4       | 0          | 0          |
| 16  | 2    | 0001     | 4          | 1       | 3       | 0          | 0          |
| 17  | 3    | 0010     | 4          | 1       | 3       | 0          | 0          |
| 18  | 4    | 0011     | 4          | 2       | 2       | 0          | 1          |
| 19  | 5    | 0100     | 4          | 1       | 3       | 0          | 0          |
| 20  | 6    | 0101     | 4          | 2       | 2       | 0          | 1          |
| 21  | 7    | 0110     | 4          | 2       | 2       | 0          | 1          |
| 22  | 8    | 0111     | 4          | 3       | 1       | 0          | 0          |
| 23  | 9    | 1000     | 4          | 1       | 3       | 0          | 0          |
| 24  | 10   | 1001     | 4          | 2       | 2       | 0          | 1          |
| 25  | 11   | 1010     | 4          | 2       | 2       | 0          | 1          |
| 26  | 12   | 1011     | 4          | 3       | 1       | 0          | 0          |
| 27  | 13   | 1100     | 4          | 2       | 2       | 0          | 1          |
| 28  | 14   | 1101     | 4          | 3       | 1       | 0          | 0          |
| 29  | 15   | 1110     | 4          | 3       | 1       | 0          | 0          |
| 30  | 16   | 1111     | 4          | 4       | 0       | 0          | 0          |
+-----+------+----------+------------+---------+---------+------------+------------+
| 31  | 1    | 00000    | 5          | 0       | 5       | 0          | 0          |
| 32  | 2    | 00001    | 5          | 1       | 4       | 0          | 0          |
| 33  | 3    | 00010    | 5          | 1       | 4       | 0          | 0          |
| 34  | 4    | 00011    | 5          | 2       | 3       | 0          | 0          |
| 35  | 5    | 00100    | 5          | 1       | 4       | 0          | 0          |
| 36  | 6    | 00101    | 5          | 2       | 3       | 0          | 0          |
| 37  | 7    | 00110    | 5          | 2       | 3       | 0          | 0          |
| 38  | 8    | 00111    | 5          | 3       | 2       | 1          | 1          |
| 39  | 9    | 01000    | 5          | 1       | 4       | 0          | 0          |
| 40  | 10   | 01001    | 5          | 2       | 3       | 0          | 0          |
| 41  | 11   | 01010    | 5          | 2       | 3       | 0          | 0          |
| 42  | 12   | 01011    | 5          | 3       | 2       | 1          | 1          |
| 43  | 13   | 01100    | 5          | 2       | 3       | 0          | 0          |
| 44  | 14   | 01101    | 5          | 3       | 2       | 1          | 1          |
| 45  | 15   | 01110    | 5          | 3       | 2       | 1          | 1          |
| 46  | 16   | 01111    | 5          | 4       | 1       | 0          | 0          |
| 47  | 17   | 10000    | 5          | 1       | 4       | 0          | 0          |
| 48  | 18   | 10001    | 5          | 2       | 3       | 0          | 0          |
| 49  | 19   | 10010    | 5          | 2       | 3       | 0          | 0          |
| 50  | 20   | 10011    | 5          | 3       | 2       | 1          | 1          |
| 51  | 21   | 10100    | 5          | 2       | 3       | 0          | 0          |
| 52  | 22   | 10101    | 5          | 3       | 2       | 1          | 1          |
| 53  | 23   | 10110    | 5          | 3       | 2       | 1          | 1          |
| 54  | 24   | 10111    | 5          | 4       | 1       | 0          | 0          |
| 55  | 25   | 11000    | 5          | 2       | 3       | 0          | 0          |
| 56  | 26   | 11001    | 5          | 3       | 2       | 1          | 1          |
| 57  | 27   | 11010    | 5          | 3       | 2       | 1          | 1          |
| 58  | 28   | 11011    | 5          | 4       | 1       | 0          | 0          |
| 59  | 29   | 11100    | 5          | 3       | 2       | 1          | 1          |
| 60  | 30   | 11101    | 5          | 4       | 1       | 0          | 0          |
| 61  | 31   | 11110    | 5          | 4       | 1       | 0          | 0          |
| 62  | 32   | 11111    | 5          | 5       | 0       | 0          | 0          |
+-----+------+----------+------------+---------+---------+------------+------------+
| 63  | 1    | 000000   | 6          | 0       | 6       | 0          | 0          |
| 64  | 2    | 000001   | 6          | 1       | 5       | 0          | 0          |
| 65  | 3    | 000010   | 6          | 1       | 5       | 0          | 0          |
| 66  | 4    | 000011   | 6          | 2       | 4       | 0          | 0          |
| 67  | 5    | 000100   | 6          | 1       | 5       | 0          | 0          |
| 68  | 6    | 000101   | 6          | 2       | 4       | 0          | 0          |
| 69  | 7    | 000110   | 6          | 2       | 4       | 0          | 0          |
| 70  | 8    | 000111   | 6          | 3       | 3       | 0          | 1          |
| 71  | 9    | 001000   | 6          | 1       | 5       | 0          | 0          |
| 72  | 10   | 001001   | 6          | 2       | 4       | 0          | 0          |
| 73  | 11   | 001010   | 6          | 2       | 4       | 0          | 0          |
| 74  | 12   | 001011   | 6          | 3       | 3       | 0          | 1          |
| 75  | 13   | 001100   | 6          | 2       | 4       | 0          | 0          |
| 76  | 14   | 001101   | 6          | 3       | 3       | 0          | 1          |
| 77  | 15   | 001110   | 6          | 3       | 3       | 0          | 1          |
| 78  | 16   | 001111   | 6          | 4       | 2       | 0          | 0          |
| 79  | 17   | 010000   | 6          | 1       | 5       | 0          | 0          |
| 80  | 18   | 010001   | 6          | 2       | 4       | 0          | 0          |
| 81  | 19   | 010010   | 6          | 2       | 4       | 0          | 0          |
| 82  | 20   | 010011   | 6          | 3       | 3       | 0          | 1          |
| 83  | 21   | 010100   | 6          | 2       | 4       | 0          | 0          |
| 84  | 22   | 010101   | 6          | 3       | 3       | 0          | 1          |
| 85  | 23   | 010110   | 6          | 3       | 3       | 0          | 1          |
| 86  | 24   | 010111   | 6          | 4       | 2       | 0          | 0          |
| 87  | 25   | 011000   | 6          | 2       | 4       | 0          | 0          |
| 88  | 26   | 011001   | 6          | 3       | 3       | 0          | 1          |
| 89  | 27   | 011010   | 6          | 3       | 3       | 0          | 1          |
| 90  | 28   | 011011   | 6          | 4       | 2       | 0          | 0          |
| 91  | 29   | 011100   | 6          | 3       | 3       | 0          | 1          |
| 92  | 30   | 011101   | 6          | 4       | 2       | 0          | 0          |
| 93  | 31   | 011110   | 6          | 4       | 2       | 0          | 0          |
| 94  | 32   | 011111   | 6          | 5       | 1       | 0          | 0          |
| 95  | 33   | 100000   | 6          | 1       | 5       | 0          | 0          |
| 96  | 34   | 100001   | 6          | 2       | 4       | 0          | 0          |
| 97  | 35   | 100010   | 6          | 2       | 4       | 0          | 0          |
| 98  | 36   | 100011   | 6          | 3       | 3       | 0          | 1          |
| 99  | 37   | 100100   | 6          | 2       | 4       | 0          | 0          |
| 100 | 38   | 100101   | 6          | 3       | 3       | 0          | 1          |
| 101 | 39   | 100110   | 6          | 3       | 3       | 0          | 1          |
| 102 | 40   | 100111   | 6          | 4       | 2       | 0          | 0          |
| 103 | 41   | 101000   | 6          | 2       | 4       | 0          | 0          |
| 104 | 42   | 101001   | 6          | 3       | 3       | 0          | 1          |
| 105 | 43   | 101010   | 6          | 3       | 3       | 0          | 1          |
| 106 | 44   | 101011   | 6          | 4       | 2       | 0          | 0          |
| 107 | 45   | 101100   | 6          | 3       | 3       | 0          | 1          |
| 108 | 46   | 101101   | 6          | 4       | 2       | 0          | 0          |
| 109 | 47   | 101110   | 6          | 4       | 2       | 0          | 0          |
| 110 | 48   | 101111   | 6          | 5       | 1       | 0          | 0          |
| 111 | 49   | 110000   | 6          | 2       | 4       | 0          | 0          |
| 112 | 50   | 110001   | 6          | 3       | 3       | 0          | 1          |
| 113 | 51   | 110010   | 6          | 3       | 3       | 0          | 1          |
| 114 | 52   | 110011   | 6          | 4       | 2       | 0          | 0          |
| 115 | 53   | 110100   | 6          | 3       | 3       | 0          | 1          |
| 116 | 54   | 110101   | 6          | 4       | 2       | 0          | 0          |
| 117 | 55   | 110110   | 6          | 4       | 2       | 0          | 0          |
| 118 | 56   | 110111   | 6          | 5       | 1       | 0          | 0          |
| 119 | 57   | 111000   | 6          | 3       | 3       | 0          | 1          |
| 120 | 58   | 111001   | 6          | 4       | 2       | 0          | 0          |
| 121 | 59   | 111010   | 6          | 4       | 2       | 0          | 0          |
| 122 | 60   | 111011   | 6          | 5       | 1       | 0          | 0          |
| 123 | 61   | 111100   | 6          | 4       | 2       | 0          | 0          |
| 124 | 62   | 111101   | 6          | 5       | 1       | 0          | 0          |
| 125 | 63   | 111110   | 6          | 5       | 1       | 0          | 0          |
| 126 | 64   | 111111   | 6          | 6       | 0       | 0          | 0          |
+-----+------+----------+------------+---------+---------+------------+------------+
|     |      |          |            |         |         | 14         | 42         |
+-----+------+----------+------------+---------+---------+------------+------------+

So for 126 possible cases of election result. There are 14 cases when we cast a strong decisive vote and 42 cases when we cast a week decisive vote. So the probability that we cast a decisive vote is:

  • 14/126=11.11% (strong decisive vote)
  • 42/126=33.33% (week decisive vote)

Here is a summary table:

+--------+-------+--------+------+--------+-------+--------+-------+--------+
|  # of  |       |       sum     | cumulative sum |   probability  |        |
| voters | cases | strong | weak | strong | weak  | strong | weak  | approx |
+--------+-------+--------+------+--------+-------+--------+-------+--------+
| 1      | 2     | 1      | 1    | 1      | 1     | 50.0%  | 50.0% | 28.2%  |
| 2      | 4     | 0      | 2    | 1      | 3     | 16.7%  | 50.0% | 19.9%  |
| 3      | 8     | 3      | 3    | 4      | 6     | 28.6%  | 42.9% | 16.3%  |
| 4      | 16    | 0      | 6    | 4      | 12    | 13.3%  | 40.0% | 14.1%  |
| 5      | 32    | 10     | 10   | 14     | 22    | 22.6%  | 35.5% | 12.6%  |
| 6      | 64    | 0      | 20   | 14     | 42    | 11.1%  | 33.3% | 11.5%  |
+--------+-------+--------+------+--------+-------+--------+-------+--------+

approx has been calculated according to the formula suggested by whuber:

$\displaystyle{P}{\left({t}{i}{e}\right)}=\frac{1}{{{2}\sqrt{{{n}\cdot\pi}}}}$

Maybe this approximation works for higher number of voters, but I am not sure yet. For small number of voters this approximation is far from theoretical truth.

enter image description here

Please consider this answer as the extension to the question. I would be grateful if anybody could post an equation for decisive vote probability as a function of unknown voters taking part in the election.


For larger numbers already >10 voters we see that the probability of a difference equal to 1 or less is already approaching the theoretical value (based on the binomial distribution with $p=0.5$) very quickly. But we need to use $\sqrt{\frac{2}{\pi n}}$ The image below demonstrates this.

comparison

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
Przemyslaw Remin
  • 1,128
  • 10
  • 16
  • there are two answers already, @whuber answer is $O(1/\sqrt{n})$ and mine is $O(1/n)$. I believe mine fits the empirical data and simulation done by researchers in the field. it's actually not my answer in this regard, but I refer to an established result. your simple model is grossly unrealistic, it's not like we go to a polling station and flip a coin. many people have very clear idea who they want, it's just the question of whether they show up at the polling station to actually cast a vote. also the challenge with "simple" models it's easy to critique them, e.g. delegate system in USA etc – Aksakal Mar 02 '20 at 14:32
  • with a coin flip model of yours you should get to a result similar to @whuber because it's a very similar assumption of 50% split in preferences, maybe be expressed differently but effectively the same. you should get $O(1/\sqrt{n})$ answer or even much stronger, i.e. overestimate the probability grossly. the empirical research contradicts this result, and the main reason is that the dispersion of the split is much wider than what this type of model assumes around 50% – Aksakal Mar 02 '20 at 14:38
  • What is O in the formula? – Przemyslaw Remin Mar 02 '20 at 14:40
  • I'm saying proportional to $1/n$, where $n$ is number of voters. So, it's not exactly $1/n$ but can be a number of that magnitude, say $2/n$ etc. See e.g. https://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions – Aksakal Mar 02 '20 at 14:41
  • run your simulation on increasing number of voters, say, 10,100,1000,10000 etc. then draw a line showing the percentage of strong decisive vote on the number of voters, this should give you a better idea of what your model predicts for larger elections – Aksakal Mar 02 '20 at 14:58
  • The difference is that you are counting a different thing. You are summing up the cases when 1 to n people vote instead of considereing only the case when n people vote, (sidenote: it is wrong to call the frequency of events among the cases in your table a probability because the cases do not have equal probability when you say *"We don't know how many voters will take part in the election."*). – Sextus Empiricus Mar 02 '20 at 17:14
  • 2
    @PrzemyslawRemin, now tweak your simulation to allow for probability of voting one way or another to vary, i.e. not be exactly 1/2, and see what happens. Play with different dispersions of a split, e.g. Normal distribution $\mathcal N(0.5,\sigma^2)$, where your variance small or large. you already saw what happens in this model of exactly 1/2, where the square root type of relationship to $n$ arises, once you allow for variation of the split, you'll see how the decline speeds up – Aksakal Mar 02 '20 at 17:19
  • You are missing to count all the possibilities that start with a zero. For example: when two people vote, then not only 10 will result in a tie but also 01. – Sextus Empiricus Mar 02 '20 at 17:37
  • @SextusEmpiricus Of course! Facepalm! – Przemyslaw Remin Mar 03 '20 at 07:09
  • @Przemyslawremin the more specific problem that you currently describe here is solved by whuber's answer. The distribution of votes for 1 and votes for 0 follows a binomial distribution and if the probability of a vote for 0:1 is fifty-fifty then you get this 1/sqrt(n) relationship. But note that relaxing that fifty-fifty condition will change that and may turn it into aksekal's 1/n relationship. I wonder now what is actually your question. Is it the little problem about the very specific fifty-fifty problem or is the bigger problem of a more general (realistic) setting? – Sextus Empiricus Mar 05 '20 at 09:03
  • @SextusEmpiricus I am looking for realistic probability but I wanted to start with the theoretical example to have a starting point and better understand the problem. Now I would like to confirm that whuber approximation is the highest possible probability that may exist for a tie. – Przemyslaw Remin Mar 05 '20 at 09:29
  • @PrzemyslawRemin short note, for the odd numbers your probability will be twice as high for a near tie result. (because you have two situations 1 less or 1 higher, in the case of odd numbers you have only 1 situation for a difference less than one vote). I have added an image to your answer to illustrate this. For the odd numbers I doubled the value (for the situation that you either have a difference -1 or +1) you can halve this if you like, if you are thinking only about the difference in favor of one direction. You can remove the change if you want. – Sextus Empiricus Mar 05 '20 at 10:52
  • @PrzemyslawRemin your bookkeeping is a bid strange. For n=2 I count 2/4=50% cases. For n=3 and n =4 I count 3/8 and 6/16 cases, For n=5 and n=6 I count 10/32 and 20/64 cases. – Sextus Empiricus Mar 05 '20 at 10:57
  • @SextusEmpiricus I leave it as you added it. Do you mean the probability of what I called strong decisive vote? – Przemyslaw Remin Mar 05 '20 at 10:58
  • @PrzemyslawRemin I have changed it now. Before I computed the difference that the difference is between -1 or 1 and your weak decisive vote is about the difference between 0 or 1 (one more but not one less). This changes the odd numbers but not the even numbers. – Sextus Empiricus Mar 05 '20 at 11:10
  • The graph that I plotted shows weak/cases. Which is for your case 1/2, 2/4, 3/8, 6/16, 10/32, 20/64. What you plotted is the cumulative cases (and for the specific strong case only). – Sextus Empiricus Mar 05 '20 at 11:13