Strange election results and probability of election fraud

Question

Suppose an election is held for the leadership position in a major political party. Four candidates are running. After the election, the following results are announced:

    Candidate A: 160823 votes
    Candidate B: 115162 votes
    Candidate C: 82028 votes
    Candidate D: 46065 votes

Candidate A is declared the winner by a clear margin. However, at that point people notice that the percentages derived from the announced numbers of votes, when rounded to three decimal digits, look suspiciously round:

    Candidate A: 39.800 %
    Candidate B: 28.500 %
    Candidate C: 20.300 %
    Candidate D: 11.400 %

All four rounded percentages have zeros in their 2nd and 3rd decimal digits! This makes lots of people suspect that someone came up with the percentages first and used the made-up percentages to deduce the number of votes. What ensues is a fire of speculation about election fraud in social media and the press.

How can one determine the probability of fraud in such a situation? One approach would be to say:

$p(fraud|result) = 1 - p(fair|result)$

where

$p(fair|result) = \frac{p(result|fair)p(fair)}{p(result|fair)p(fair)+p(result|fraud)p(fraud)}$

Then, the priors can be set to reasonable values (e.g. $p(fair)=0.99$), while the conditionals $p(result|fair)$ and $p(result|fraud)$ are more tricky.

One could argue that $p(result|fraud) \sim 0.01$ by the token that a fraudster would not consider more than a couple of hundred ways to cheat and that setting the percentages to round values would be the most obvious choice of a naive fraudster (the "inexperienced fraudster" argument). However this is quite hand-wavy.

Estimating $p(result|fair)$ is also problematic. On the one hand, getting four percentages that round so nicely to the third digit looks very suspicious -almost fabricated. On the other hand, the probability of getting percentages that round (in the 2nd & 3rd decimal digit) to 00, 00, 00, 00 is not going to be very different from the probability of getting percentages that round (in 2nd & 3rd digit) to 01, 23, 45, 31, or any other allowed combination, so why single out the former?

How can one formulate the layman intuition that four percentages which round so nicely look suspicious? I thought of using the criterion: Find the probability that the rounded percentages can be generated from a simple rule which repeats the same pattern in their 2nd and 3rd decimal digits. For example, the patterns 00, 00, 00, 00 and 25, 25, 25, 25 are the two simplest possible patterns, and will be grouped together when estimating the probability. The probability of getting one of the patterns in the lowest simplicity class is very low ($<10^{-12}$ by a Monte Carlo estimate). This is not entirely satisfactory, as it maintains the distinction between "suspicious" and "non-suspicious" numbers but hides it behind the idea of "simple" vs "non-simple" patterns.

Does anyone have any idea of a different way to arrive at a reasonable estimate of $p(fair|result)$, and/or $p(result|fraud)$, or even an entirely different approach? (you can assume that you do not have access to election results broken down by voting center, area, etc.)

Note: The above is not a hypothetical scenario, it happened in real life. The quoted numbers are the numbers of votes received by the four candidates for the presidency of the Greek party of "New Democracy" on December 20th 2015.

score 2 · Answer 1 · answered Dec 27 '15 at 12:21

I don't think the rounding or the results themselves can serve as an indicator of fraud. The rounding could just be an error of the person publicising the results for the government not understanding rounding and significant digits.

Rather than look at the final result, I think a statistical test for fraud could be based the distribution of the voting result, voter turnout, and pre-election polling. Pre-election polling can be completely inaccurate due to the difficulty of being able to predict who will show up to the polls, but it could be useful when taking into consideration with other factors.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3478593/

I think the reporting error during announcement can be ruled out. You can rederive the percentages from the numbers of votes and confirm that they round nicely to the 2nd and 3rd digit. However, I tend to agree with you that more data about the votes per voting center would be required to really establish fraud. Four integers are just too little. — nikosd, Dec 27 '15 at 17:41

Tim · Accepted Answer · 2015-12-28T10:03:08.883

It is a common bias described by Kahneman and Tversky (1972) that people assume that random events need to "look" randomly to be representative sample:

The notion of representativeness is best explicated by specific examples. Consider the following question:

All families of six children in a city were surveyed. In 72 families the exact order of births of boys and girls was GBGBBG. What is your estimate of the number of families surveyed in which the exact order of births was BGBBBB?

The two birth sequences are about equally likely, but most people will surely agree that they are not equally representative. (...) However, when we asked the same Ss to estimate the frequency of the sequence BBBGGG, they viewed it as significantly less likely than GBBGBG (p < .01), presumably because the former appears less random. (...) A major characteristic of apparent randomness is the absence of systematic patterns. A sequence of coin tosses, for example, which contains an obvious regularity is not representative. Thus, alternating sequences of heads and tails, such as HTHTHTHT or TTHHTTHH, fail to reflect the randomness of the process. Indeed, Ss judge such sequences as relatively unlikely and avoid them in producing simulated random sequences (Tune, 1964; Wagenaar, 1970).

There is no reason why random sequence should not contain round values. Sequence with round values is not any more likely than the one without such values. Human mind is build for finding patterns around us and so is susceptible to finding nonexistent patterns, what leads to multiple cognitive and judgment biases. This was nicely described by Daniel Kahnemans Thinking, Fast and Slow book. David J. Hand in his book The Improbability Principle describes lots of examples of "improbable" events that just happen in real life by pure coincidence. Sorry, but there is nothing suspicious about round numbers.

Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment of representativeness. Cognitive Psychology, 3, 430-454.

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124-1131.

I agree with your comment with a caveat: Some numbers are more special than others, because the vast majority of numbers are their own shortest description, while others can be generated using a short set of rules. Take pi, for example. It looks random, but the pattern follows a simple set of rules. That is not true of most reals. Formally, each pattern comes with an algorithmic complexity (how "hard" it is to compute) and the set of patterns which cannot be compressed to simple rules has higher cardinality than those which can be. However, applying this idea to election fraud is problematic. — nikosd, Dec 27 '15 at 17:56
It is problematic because even if the final pattern looks "simple", it might have been a very complex task to find the set of integers which produce the made up percentages that the fraudster wanted to get at. In other words, even if someone had made up the percentages, they would have had a hard time finding integers which produce those percentages. So, the simplicity of percentages does not capture the complexity of the problem. This is the reason I was unhappy with my "pattern simplicity' solution and decided to post here. So, thanks! — nikosd, Dec 27 '15 at 18:02

Strange election results and probability of election fraud

2 Answers2