12

In this post we ask a question about a natural phenomenon called humans attempt to find decision by counting votes. The specific incident of such natural phenomenon that this question is about is the case of Brexit.

Note: the question is not about politics. The goal is to try to discuss such natural phenomenon from a statistical point of view based on observations.

The specific question is:

  • Question: What does the $51.9\%$ Brexit vote to leave mean? E.g. does it mean that the public really wants to leave EU? Does it simply mean that the public is unsure and needs more time to think? Or is it something else?

Assumption 1: there is no error in the voting process.

caveman
  • 2,431
  • 1
  • 16
  • 32
  • 14
    Democracy is **not** about statistical significance. The 51.9% result means that 51.9% of those who voted, voted "leave". This is not an opinion poll. Those who didn't vote, voted by (not) using their feet. Interpreting 51.9% as "public is unsure and needs more time to think" is simply lying with statistics. Brexit happened with probability 1. – Tim Jun 24 '16 at 20:07
  • I agree on what democracy is. However, I am not trying to look at such political aspects. My goal here is to interpret the numbers from a statistical point of view in order to identify whether it was better for UK to conclude that "*well, we are not sure, let's try again!*" by using a principled statistical method – caveman Jun 24 '16 at 20:30
  • 1
    I think statistical significance *can* be important if the sample size is low. Obviously the results of the poll are all that matter politically, but the thoughts of ALL citizens can be of interest as well. – Underminer Jun 24 '16 at 20:40
  • 7
    This thread is destined to be non-statistical, opinionated, and possibly even polemical. It's just no fit for this site, regardless of how popular it might be. We have a [chat room](http://chat.stackexchange.com/rooms/18/ten-fold) populated by people who would be happy to engage further in such conversations: check it out! – whuber Jun 24 '16 at 20:51
  • 2
    I believe the current discussion is statistically focused and is a good example of interpreting voting results as it applies to statistical testing. – Underminer Jun 24 '16 at 20:56
  • 1
    @Underminer I could have agreed with that earlier, but the comments and an answer by the OP (as well as your comments) are clearly steering this thread into opinionated discussion. – whuber Jun 24 '16 at 21:04
  • Personally my direction on this is to interpret votes (and unvotes) in order to estimate the probability that this case was a grey area. I *think* there is hope in reaching an objective conclusion that explicitly states the assumptions. I think it goes under assuming a distribution for votes when the matter at hand is a grey area. Plus I think this could have been a hot topic that would help boost the popularity of Cross Validated ;) – caveman Jun 24 '16 at 21:16
  • @whuber , my comment is intended to define the statistical population of interest, not steer towards opinionated discussion. The question itself seems to seek statistical explanation. – Underminer Jun 24 '16 at 21:35
  • 3
    You bring up an important issue: measurement error of public opinion metrics such as polls. I'm afraid that main source of error is not from the sample size. – Aksakal Jun 24 '16 at 22:03
  • @Aksakal voting is definitely not an opinion poll and should not be treated as so. – Tim Jun 25 '16 at 10:33
  • *Is the concept of sampling error relevant in election results?* is a focused statistical question that would be well on-topic here, in my opinion. *What does the 51.9% Brexit vote to leave mean? E.g. does it mean that the public really wants to leave EU? Does it simply mean that the public is unsure and needs more time to think? Or is it something else?* is inherently unanswerable here because it asks lots of questions about "what voters really want" that lie far beyond the scope of this site - far better suited to Politics Stack Exchange! – Silverfish Jun 25 '16 at 11:48
  • 1
    You all know how easy it is to mess up the measurements by sample preparation, by filtering out observations etc. That's what happens on the national scale when you have like Brexit. You get Obama trying to scare you into "remain" vote: we won't make trade deals with you if you leave. You get all the media filled with economists, Treasury and BoE promising the end of the world if you leave. All these "experts" are trying to frighten you to stay. There was this massive coercion of British public going on for months, and you tell me there was no sampling bias issues. You guys are funny – Aksakal Jun 25 '16 at 13:38
  • 1
    This was a huge experiment: those at the driving seat trying to convince the passengers to go where they didn't want to go anymore. We have a similar situation in USA. Almost entire set of traditional media trashing one of the presidential candidates 24/7 for more than a year in attempt to taint the sample, and get the desired results at the polls: to no avail so far. We'll have the same kind of sampling issue in November in USA. The measurement issues are sampling issues, but not not the sampling size issue: that's what I'm trying to say. – Aksakal Jun 25 '16 at 13:43
  • 1
    @Tim: while democracy is not necessarily about statistical significance, it is neither necessarily about "50%+1". While it is true that 17 million voted for Brexit, it is also true that 16 million voted against it. To use "50%+1" as a criterion is not intrinsic to the notion of democracy, – Martin Argerami Jun 25 '16 at 13:54
  • @MartinArgerami you are right abut discussing it in details is off-topic – Tim Jun 25 '16 at 14:00
  • 4
    IMHO, this is a non-statistical question with a thin veneer of statistics added to disguise that fact. As I read it, the assumption "there is no error in the voting process" eliminates all statistical considerations and necessarily channels the discussion into what "voting ... means" in a democracy. That's a matter of political science and philosophy, not statistics. – whuber Jun 25 '16 at 14:10
  • For the record: I did not vote to close as opinion-based (which is the closure reason now), I voted to close as off-topic. I have now checked revision history and I am not sure I understand why it was re-opened at all, after @whuber's initial closure (as off-topic). – amoeba Jun 25 '16 at 15:16
  • @Amoeba The system is a little confusing that way. It appears that when a mod votes to close, only that reason is posted even when others gave other reasons. – whuber Jun 25 '16 at 18:23
  • It's a pity that the question was closed, because the subject is interesting. It would be possible to moderate it to keep the pure politics out and statistics in. The idea of election is to find out what people want. Statisticians could opine on how to do it best. Obviously, nobody's going to listen, but it's still worth a discussion – Aksakal Jun 05 '18 at 20:52

6 Answers6

17

I agree with @Underminer that there is no sampling error, but not because sample is large, but because there was no sampling involved. Nobody was sampled to vote. There obviously was some negligible fraction of people who wanted to vote but, weren't able to (e.g. had car accident on this day), or who made invalid votes, but that's the only "sampling" in here.

The result is exact, there is no error involved since the whole population took part in vote (some took part by not taking part in it). Some people decided to vote, some didn't. Some decided to vote on leave, some didn't. Democracy is not about statistical significance, but about what really happened. Voting is not intended to learn about people opinion, but to make a decision. Actually, people sometimes do not vote according to what they think, but to manifest, or achieve something. For example, in election people may vote not to their preferred candidate, but to their second preferred one if they think he has greater chances of winning.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Consider the case of a grey area where the voting population isn't very sure about what's good for them. For example, the case of having 2 candidates that are almost equally good. In such case, I think those who vote, will probably differ unsystematically as I think their votes might have a distribution close to a uniform one. My goal here is not to re-define democracy (a political topic) but rather to see what can we say about whether Brexit was a grey area? – caveman Jun 24 '16 at 20:54
  • 2
    @caveman no matter if they are sure, or not, what matters is how they voted since voting is about actual votes. For sure, some people did not have clear opinion, with some of them voting and some not, but this also doesn't matter since what counts is the actual votes of those who voted. – Tim Jun 24 '16 at 21:05
  • If I understand it correctly, your point is about how democracy interprets votes? I agree with you. However, I am not interpreting it in the way politicians do. I am trying to use the population to identify whether a decision is good, bad, or not very clear. This is a different usage of voting. – caveman Jun 24 '16 at 21:09
  • @caveman if you considered voting in terms of "error", then is such cases as Brexit it would be **impossible** to make decision since the percentage is close and you could always argue that it was "not enough". 50% + 1 vote is enough, end of story. – Tim Jun 24 '16 at 21:16
  • I don't consider it as a voting error. **Assumption 1** states that there are no voting errors. However, there could be some error in the natural neural network in the heads of the voters when modelling the voting process as a solution to find optimal solution to the Brexit problem. I.e., it can be thought as a *classification* error. I am not discussing the political interpretation. Yes, I know that politicians simply count 50+1 as enough, but this isn't about the political view. This is rather about a statistical view to try to estimate the confidence of the biological classifiers on Brexit – caveman Jun 24 '16 at 21:29
  • 2
    @caveman people change their minds all over the time, psychologists wrote thousands of papers about this... Yes, 51.9% doesn't mean that exactly 51.9% of Brits is 100% sure about leaving EU. People can even be unsure about comparing lengths of lines (https://en.wikipedia.org/wiki/Asch_conformity_experiments)... – Tim Jun 24 '16 at 21:37
  • You're oversimplifying the voting setup. For instance, in some places, such as USA, there are voting violations, where members of certain communities are systematically denied their voting rights by different means. It's done so that you could say "they didn't show up, they must not have wanted to vote." This happens all the time at all levels up to the presidential elections. You can also count on the media that systematically shapes opinion in ways that are beneficial to some forces etc. The world is not as simple and clean as you try to make it look – Aksakal Jun 25 '16 at 13:34
  • @Aksakal I didn't say anything like this... Still, besides voting violations (and I do not seem we are discussing them in here), there is no "random sampling" involved. – Tim Jun 25 '16 at 13:49
  • @Tim you wrote "there was no sampling involved." Unless British voting is perfect, there is sampling involved. In USA there's this [voting ID laws](http://www.huffingtonpost.com/2013/08/25/colin-powell-voter-id-laws-backfire_n_3813092.html) issue, which is induces a systematical bias against certain minorities into the sample. – Aksakal Jun 25 '16 at 14:03
  • @Aksakal That may be stratification, perhaps, but it still is not sampling. Sampling involves the *investigator* selecting the individuals to be counted, whereas in a voting process the individuals themselves make the selection. This matters, since a voting process, as Tim stated, does not involve statistics. Approximately 25% of eligible voters decided not to participate. It is impossible, even by holding a second referendum (since opinions change over time), to figure out the opinions of this silent 25%. The only fact available is that 51.9% of the votes were cast in favour of Brexit. – user3697176 Jun 27 '16 at 13:21
  • @user3697176, it's not entirely true. As I wrote before some voters are regularly denied their voting rights. They did not choose to skip voting, but were filtered out. – Aksakal Jun 27 '16 at 13:47
  • 1
    @Aksakal I am not going to comment on who is eligible to vote and who isn't. I am also not going to comment on how difficult it might be to obtain the necessary credentials. That is politics and as such not on-topic here. From a statistical perspective, each eligible voter has a certain probability of not voting. This probability might be influenced by certain factors that may or may not be related to their preferences, but each eligible voter chooses (not) to exercise that right at his/her discretion. – user3697176 Jun 27 '16 at 15:03
  • @user3697176 the probability of being denied voting rights is higher for minorities, young and otherwise vulnerable population. You could say that missing voting data is not at random. This is a sampling issue. – Aksakal Jun 27 '16 at 16:04
9

51.9% is the percentage of voters who want to leave. Since the sample size is so large (>33 million), there is virtually no random sampling error.

Statistical significance testing would try to determine if the difference in remain and leave could be explained by random sampling error alone, and the difference would certainly be significant (see @caveman's answer).

The problem with this approach is that statistical significance makes a very strong assumption that the sample is representative of the entire population (all of Britain), not just those who vote.

The non-response rate (those that do not vote) is enormously important in determining if more than half of all of Britain wants to 'leave', and is difficult to measure. Non-response bias is created when subgroups who are less likely to vote have systematically different views. Based on exit-polls, for example, millennials were less likely to vote, but more likely to vote to remain, which biases the results when trying to represent the population of all of Britain.

For this reason, statistical significance testing in its traditional sense is largely inappropriate.


Assumptions: We need to define some terms for any of this to make sense and avoid political discussion of what voting is trying to accomplish. Here are my definitions:

Population: Every person living in Britain

Sampling Frame: Every voting eligible person capable of voting

Sampling Methodology: Voluntary response, the act of voting is participating in the survey

Sample: The individuals who actually vote

In this setup, the sample proportion could be used (for better or worse) to estimate the percentage of all people who lean towards remain (or leave).

Underminer
  • 3,723
  • 1
  • 20
  • 36
8

You ask

What does the 51.9% Brexit vote to leave mean?

It means 51.9% of the voters voted to leave.

E.g. does it mean that the public really wants to leave EU? Does it simply mean that the public is unsure and needs more time to think? Or is it something else?

The votes comprised $17\,421\,887$ "leave" votes and $16\,146\,297$ "remain" votes, indicating $12\,931\,353$ eligible voters did not vote and approximately $18$ million inhabitants are not eligible voters. Since neither the collection of actual voters nor the collection of eligible voters is "the public" and neither is a representative (random, unbiased, pick a relevant adjective) sample of "the public", the 51.9% Brexit vote is noninforming to your second and subsequent questions.

It might have been possible to construct a questionnaire responsive to your questions. This does not seem to have been what happened in the referendum as implemented.

Eric Towers
  • 351
  • 1
  • 5
  • 1
    Could you please discuss the meaning of the votes in relation to the *voters* (i.e. not the entire population), beyond the surface conclusion that it means "*51.9% voted leave*"? I wonder what is the extent of information we can extract from this. – caveman Jun 25 '16 at 13:53
  • 4
    Caveman, this comment, more than any other, demonstrates your question is non-statistical. Because 51.9% (together with the total counts) constitute *all* the data in evidence about the voters, and there is no uncertainty (unless you want to challenge the accuracy of the counting, which is a separate issue), your rejection of this answer implies you are looking for *non-statistical* conclusions. – whuber Jun 25 '16 at 14:13
  • What if we model Brexit as a binary classification problem, and consider voters as estimates of classifiers that are a member of an ensemble. In this model, the goal is not to identify what the majority of citizens want, but rather the goal is to identify the optimal classifier from the space of classifiers. We can then use some measures to test the goodness of such human-voter-based classifier ensemble. E.g. we may use Perplexity or something else that is suitable for this binary classification task where ground truth is unknown (e.g. we clearly don't know if leave is better than remain). – caveman Jun 25 '16 at 18:56
  • @caveman : Given that the ground truth is (correctly) unknown, what metric would you use to "identify the optimal classifier from the space of classifiers"? Any such metric encodes the biases of the analyst that picks the metric, except for the metric "reproduces the result of the vote", for which metric you already know the answer: 51.9%/48.1%. – Eric Towers Jun 26 '16 at 21:21
  • @EricTowers I've taken this to politics.stackexchange.com where I talked about different methods -- https://politics.stackexchange.com/questions/11433/voting-methods-that-take-voters-stability-into-account – caveman Jun 26 '16 at 22:10
2

TL;DR

I simulated an unsure population below (under details) for $R=1000$ times, and then measured the probability of observing a leave vote of $\ge 51.9\%$ under such unsure simulated population. This gave me the simulated probability that an unsure population can reach a leave vote that is $51.9\%$ or greater.

This simulated probability of leave under the unsure population is $0$.

Maybe redundant, but I also did the same but with remain to measure the probability that such unsure population to get a $\le 48.1\%$ vote remain.

This simulated probability of remain under the unsure population is also $0$.

Therefore I conclude that the Brexit vote is not a noisy side effect of an unsure or confused population. There seems to be a systematic reason that is deriving them to leave EU.

I uploaded the simulator code here: https://github.com/Al-Caveman/Brexit

Details

Given Assumption 1, the possible answers (or hypothesis) are:

  • $H_0$: The public is unsure.
  • $H_1$: The public confidently wants to leave.

Note: that it is impossible that the public confidently wants to remain because we have ruled out voting errors.

To answer this question (i.e. whether $H_0$ or $H_1$), I try to measure:

  • The probability that an unsure population can achieve $\ge 51.9\%$ leave vote.
  • Or, probability that an unsure population can achieve $\le 1-51.9\%$ remain vote.

If this probability is low enough, we can conclude that the public confidently wants to leave (i.e. $H_1$). However, if this probability is large enough, we can conclude that the public is unsure about deciding Brexit (i.e. $H_0$).

In order to measure this probability, we need to know the distribution of an unsure British population in such a binary voting system as Brexit. Therefore, my first step is to this is to simulate this distribution by following the assumption below:

  • Assumption 2: a population that is composed of unsure individuals will have a random chance vote. I.e. every possible answer has an equal chance of being chosen.

In my view this assumption is fair/reasonable.

Additionally, we model the leave and remain campaigns as two distinct processes as follows:

  • Process $P_{\text{leave}}$ with the output $O_{\text{leave}} = [l_1, l_2, \ldots, l_n]$.
  • Process $P_{\text{remain}}$ with the output $O_{\text{remain}} = [r_1, r_2, \ldots, r_n]$.

where:

  • $n$ is the total population of UK (includes non-voters).
  • For any $i \in \{1,2,\ldots,n\}$, $l_i,r_i \in \{0, 1\}$. An output value of $0$ signifies that a voter has voted no for the subject process, and $1$ significances that a voter has voted yes for the same process.

subject to the following constraint:

  • For any $i \in \{1,2,\ldots,n\}$, $l_i$ and $r_i$ cannot simultaneously be $1$ at the same time. I.e. $l_i=1$ necessarily implies that $r_i = 0$, and $r_i=1$ necessarily implies that $l_i=0$. This is due to the fact that a voter $i$ among the population $\{1,2,\ldots,n\}$ cannot vote to both leave and remain at the same time.

For example, if $O_{\text{leave}} = [1,0,0]$, it means that out of a population of $3$, one has voted yes to leave and two have voted no to leave.

Likewise, if $O_{\text{remain}} = [0,1,0]$, it means that out of a population of $3$, one has voted yes to remain and two have voted no to remain.

Note that in both of the examples above, there is one member of the population that has not voted for any of the processes (or campaigns). Specifically, the third voter (i.e. $O_{\text{leave}}[3] = O_{\text{remain}}[3] = 0$).

What we know from here is that out of $33,568,184$ ballot papers, $51.9\%$ have voted to leave EU (i.e. $100-51.9=48.1\%$ voted to remain). This means:

  • $n = 33,568,184$.
  • $33,568,184 \times 0.519 = 17,421,887.496$ have voted yes to the leave campaign. I.e. $$ \sum_{i=1}^{33,568,184}O_{\text{leave}}[i] = 17,421,887.496 \approx 17,421,887 $$
  • $33,568,184 \times (1-0.519) = 16,146,296.504$ have voted yes to the remain campaign. I.e. $$ \sum_{i=1}^{33,568,184}O_{\text{remain}}[i] = 16,146,296.504 \approx 16,146,297 $$

Therefore, we define the output arrays as follows:

  • For all $i \in \{1,2,\ldots, 17421887\}$, $O_{\text{leave}}[i] = 1$.
  • For all $i \in \{17421887+1,17421887+2,\ldots, 33568184\}$, $O_{\text{leave}}[i] = 0$.
  • For all $i \in \{1,2,\ldots, 17421887\}$, $O_{\text{remain}}[i] = 0$.
  • For all $i \in \{17421887+1,17421887+2,\ldots, 33568184\}$, $O_{\text{remain}}[i] = 1$.
  • By Assumption 2, for all $i \in \{1,2,\ldots, 33568184\}$, $O_{\text{unsure},m}[i] = C$, where $C$ is a uniformly distributed random variable that takes values in $\{0,1\}$ (e.g. a fair coin toss), and $m$ is a number that identifies a particular random instantiation of $O_{\text{unsure},m}$. In other words, the probability that two distinct random instantiations of $O_{\text{unsure},m}$ equal each other, i.e. $O_{\text{unsure},1} = O_{\text{unsure},2}$, is $0.5^{33,568,184}$.

Finally, we define the $p_{\text{leave}}$ value of the leave process as follows: $$ p_{\text{leave}} = \frac{1}{R}\sum_{m=1}^R \begin{cases} 1 & \text{if } \Big(\sum_{i=1}^{33,568,184} O_{\text{leave}}[i]\Big) \le \Big(\sum_{i=1}^{33,568,184} O_{\text{unsure},m}[i]\Big)\\ 0 & \text{else} \end{cases} $$ where $R$ is total number of simulation rounds by which at each time a random instance of $O_{\text{unsure},m}$ is defined.

Likewise, we define the $p_{\text{remain}}$ value of the remain process as follows: $$ p_{\text{remain}} = \frac{1}{R}\sum_{m=1}^R \begin{cases} 1 & \text{if } \Big(\sum_{i=1}^{33,568,184} O_{\text{remain}}[i]\Big) \ge \Big(\sum_{i=1}^{33,568,184} O_{\text{unsure},m}[i]\Big)\\ 0 & \text{else} \end{cases} $$

To answer that, I simulated the above in C using $R=1,000$ and the output is:

total leave votes: 17421887
total remain votes: 16146297
simulating p values............ ok
p value for leave: 0.000000
p value for remain: 0.000000

In other words:

  • $p_{\text{leave}} = 0$.
  • $p_{\text{remain}} = 0$.
caveman
  • 2,431
  • 1
  • 16
  • 32
  • 2
    Perhaps more important in this case is the non-response rate (i.e. individuals who do no vote). The margin of error (or measure of statistical significance) only takes into account random sample error. Non-response bias is NOT included in this, and it is much more impactful than random sampling error with a poll with such a large sample size. – Underminer Jun 24 '16 at 19:46
  • Here it says that UK has $46,499,537$ eligible voters. Meaning $46,499,537 - (17421887+16146297) = 12,931,353$ didn't vote. Any idea how to interpret such unvoting population? Source: https://en.wikipedia.org/wiki/European_Union_Referendum_Act_2015#Eligible_voters – caveman Jun 24 '16 at 20:09
  • 3
    There is no statistically satisfactory way to deal with non-random missing data. – Underminer Jun 24 '16 at 20:12
  • Those who haven't voted, could be composed of individuals that don't care about politics (e.g. no more trust). Alternatively, it such unvoters could be those who were not sure. Or, it could be a mixture of the two. What would happen if we assume that "*all unvoters are unsure*"? Would this be an upper bound for testing whether the current situation was one where the public was feeling that Brexit was a *grey area*? – caveman Jun 24 '16 at 21:03
  • 3
    There is a confusion here about the nature & scope of statistics. You are attempting to create a *process* model of voting, & how that can inform the mechanisms & validity of governance & public decision making. This is a worthwhile task *in Political Science*. It is simply not statistics (although statistics is involved). – gung - Reinstate Monica Jun 24 '16 at 21:36
1

You could ask a slightly different question: Assuming that 50% of a very large population voted "Yes", and you asked a random sample of size S, what is the probability that 51.9% of your sample responded "Yes", depending on the sample size?

The expected value of number of "Yes" votes is 0.5 S. The variance is 0.25 S. The standard definition is 0.5 $S^{1/2}$. A deviation of the actual from the expected number of "Yes" votes more than 6.1 standard deviations has a chance of one in a billion.

We have this when 0.019 S (difference between 50% and 51.9%) is 6.1 * 0.5 * $S^{1/2}$, or S = $(6.1 * 0.5 / 0.019)^2$ or S ≈ 25,800.

gnasher729
  • 661
  • 4
  • 6
0

This is another solution using an analytical method instead of a simulation.

Previously, I have simulated an unsure population to be one that its vote is random chance guessing. So out of $n$ many voters, an unsure population would tend to vote leave or remain for $0.5$ of the time.

In order for an unsure population to get exactly $51.9\%$ vote on leave, there needs to be $17,421,887$ 1s in $O_{\text{leave}}$. The probability for this is $0.5^{33,568,184}$. Similarly, the probability of getting $17,421,887 + 1$ votes is also $0.5^{33,568,184}$. This goes on.

This is the probability of getting $\ge 17,421,887$ votes: $$\begin{split} \sum_{i=17,421,887}^{33,568,184} 0.5^{33,568,184} &= (33,568,184-17,421,887) \times 0.5^{33,568,184}\\ &= 8.39663381928984×10^-10105024\\ &\approx 0\\ \end{split} $$

($8.39663381928984×10^-10105024$ calculated by Wolframalpha)

And this is the probability of having $\ge 51.9\%$ of an unsure population vote leave.

caveman
  • 2,431
  • 1
  • 16
  • 32