Suppose an election is held for the leadership position in a major political party. Four candidates are running. After the election, the following results are announced:
Candidate A: 160823 votes
Candidate B: 115162 votes
Candidate C: 82028 votes
Candidate D: 46065 votes
Candidate A is declared the winner by a clear margin. However, at that point people notice that the percentages derived from the announced numbers of votes, when rounded to three decimal digits, look suspiciously round:
Candidate A: 39.800 %
Candidate B: 28.500 %
Candidate C: 20.300 %
Candidate D: 11.400 %
All four rounded percentages have zeros in their 2nd and 3rd decimal digits! This makes lots of people suspect that someone came up with the percentages first and used the made-up percentages to deduce the number of votes. What ensues is a fire of speculation about election fraud in social media and the press.
How can one determine the probability of fraud in such a situation? One approach would be to say:
$p(fraud|result) = 1 - p(fair|result)$
where
$p(fair|result) = \frac{p(result|fair)p(fair)}{p(result|fair)p(fair)+p(result|fraud)p(fraud)}$
Then, the priors can be set to reasonable values (e.g. $p(fair)=0.99$), while the conditionals $p(result|fair)$ and $p(result|fraud)$ are more tricky.
One could argue that $p(result|fraud) \sim 0.01$ by the token that a fraudster would not consider more than a couple of hundred ways to cheat and that setting the percentages to round values would be the most obvious choice of a naive fraudster (the "inexperienced fraudster" argument). However this is quite hand-wavy.
Estimating $p(result|fair)$ is also problematic. On the one hand, getting four percentages that round so nicely to the third digit looks very suspicious -almost fabricated. On the other hand, the probability of getting percentages that round (in the 2nd & 3rd decimal digit) to 00, 00, 00, 00 is not going to be very different from the probability of getting percentages that round (in 2nd & 3rd digit) to 01, 23, 45, 31, or any other allowed combination, so why single out the former?
How can one formulate the layman intuition that four percentages which round so nicely look suspicious? I thought of using the criterion: Find the probability that the rounded percentages can be generated from a simple rule which repeats the same pattern in their 2nd and 3rd decimal digits. For example, the patterns 00, 00, 00, 00 and 25, 25, 25, 25 are the two simplest possible patterns, and will be grouped together when estimating the probability. The probability of getting one of the patterns in the lowest simplicity class is very low ($<10^{-12}$ by a Monte Carlo estimate). This is not entirely satisfactory, as it maintains the distinction between "suspicious" and "non-suspicious" numbers but hides it behind the idea of "simple" vs "non-simple" patterns.
Does anyone have any idea of a different way to arrive at a reasonable estimate of $p(fair|result)$, and/or $p(result|fraud)$, or even an entirely different approach? (you can assume that you do not have access to election results broken down by voting center, area, etc.)
Note: The above is not a hypothetical scenario, it happened in real life. The quoted numbers are the numbers of votes received by the four candidates for the presidency of the Greek party of "New Democracy" on December 20th 2015.