6

Recently the media reported on a political poll that stated that "46% of Republican voters in Mississippi think that interracial marriage should be illegal". One example story (of many around the 'net) is this one from the NY Daily News: Interracial marriage should be illegal, say 46% of Mississippi Republicans in new poll. From that article:

A survey conducted last month shows 46% of GOP voters in the state believe interracial marriage should be illegal - a plurality of the people questioned

Given that I know many people in interracial marriages (including in my own family) I was motivated to track down the full story rather than fall to the obvious hysteria. I located the press release on the poll from Public Policy Polling here (PDF) which has:

46% of these hardcore Republican voters believe interracial marriage should be illegal, while 40% think it should be legal. With Barbour included, Huckabee gets more support (22%) from the former than the latter (15%), as does Palin (13-6). The support for Bachmann (10-2), Gingrich (13-8), and Pawlenty (4-1) works the opposite way.

And at the end of the first page they state:

PPP surveyed 400 usual Mississippi Republican primary voters from March 24th to 27th. The survey’s margin of error is +/-4.9%.

Now what I don't understand is how to interpret the results especially the margin of error value.

From the raw numbers, 46% (~50%) of 400 voters is 200 voters who responded who think that interracial marriages should be illegal. But from stats of Mississippi in the 2008 Federal election, McCain polled about 725,000 Republican votes (to Obamas 556,000). My gut feeling is that you can't extrapolate that 46% number to the other 724,600 Republican voters and still retain a 5% margin of error.

So is the media misrepresenting the numbers (really? the media would do that?!?!?!) Or are there statistics at work that I don't understand?

Thanks for your help!

amoeba
  • 93,463
  • 28
  • 275
  • 317
Peter M
  • 161
  • 1
  • 2
  • This is politics. Do you really think they took the time to conduct a poll, much less an accurate poll? Since there's no way to prove or disprove the numbers to the public, why spend the money? Just make up whatever you want and say the numbers came from a poll. Most people will go for it. – bill_080 Oct 03 '11 at 18:19
  • 4
    @bill: The OP is asking about *statistical* interpretation of the results. Indeed, the question is interesting and involves several easy sources of confusion. While results of such polling should not be viewed in a vacuum, and it is important to try to ferret out potential sources of bias, the main intent seems to be to uncover some statistical subtleties. For this reason, I'm hopeful that a nice answer might appear. It's a worthy question, in my view. – cardinal Oct 03 '11 at 18:35
  • It might be worth perusing other questions on this site that handle similar issues. [Here](http://stats.stackexchange.com/questions/6916/is-every-blue-t-shirted-person-a-systematic-sample/) is one. You might also look through the **Related** links listed on the right margin. – cardinal Oct 03 '11 at 18:37
  • @cardinal - thanks for your support :) I did a cursory search of questions before I posted this one, and nothing stood out as matching what I thought I am trying to ask. And in fact I never even saw the question you suggested, all though if I had I would have reconsidered my post. I'll put it all down to not knowing (or remembering the terms) of what I was trying to find – Peter M Oct 03 '11 at 18:50
  • @cardinal: I understand the OP's question, however don't you have to trust the results before a meaningful conversation can begin? My point is simple. The purpose of a political poll is the message. How you get there doesn't really matter. And, if it doesn't make sense, should that be a surprise? – bill_080 Oct 03 '11 at 18:54
  • @bill_080 Yes it is political, yes it is a hot button topic. The people around me have reacted to the headlines (and it is hard not to), but it is the precess and not the topic I was interested in. But being political is not a crime in itself - in the next 16 months we are going to be inundated to all sorts of political polls, from all sorts of sources (good and bad). So being able to understand the results will always be a good thing. – Peter M Oct 03 '11 at 18:56
  • @Peter: Go back and look closely at my responses. This is not a "hot button" issue. It is a matter of priors. If you can't trust the source, why put any effort into the analysis? You can't prove or disprove anything about that poll. That's the point. That's why polls are used in politics. – bill_080 Oct 03 '11 at 19:01
  • 1
    @bill_080 You could probably accuse me of having an unspoken agenda of "Are the statistics valid enough to support the inference that the 46% value can be extrapolated to the remaining population? And by inference, can I trust PPP's results?". The answers below support the extrapolation part, but with the caveat that it depends on PPP being honest with their methods. Asking a question here can't validate PPP, but it can show that my gut feeling about the extrapolation is wrong. However if I had asked the same question using a different subject we wouldn't be having this conversation :) – Peter M Oct 03 '11 at 19:07
  • @bill_0880 Sure it is a hot button topic .. if you are in an interracial marriage, you don't like the idea of people invalidating your existence. But this is delving into politics and away from statistics. – Peter M Oct 03 '11 at 19:09
  • @Peter: Do you trust PPP? If so, why? If not, why? If you don't have some way to trust or distrust PPP, isn't this a coin flip? On any poll, this is, and always has been the problem. Once you find a group you can trust (which is rare), the next step is, did they conduct the poll correctly? If you can get comfortable with that, then the analysis begins. – bill_080 Oct 03 '11 at 19:14
  • 2
    @bill_080 It seems that we are asking two different questions. My question is "In what manner can you validly extrapolate a sample of 400 to 725,000", but you want to ask "Did the pollsters screw up or mislead when collecting their data?". My question can be answered here, you question is unanswerable without inside knowledge at PPP. – Peter M Oct 03 '11 at 19:23
  • @Peter: You are correct. However, this process builds from the bottom up. If you have no prior information that tells you if PPP is lying to you, or if they screwed up the poll, then whatever extrapolation you get must be viewed in that context. My point is simple. You can't prove or disprove anything from that poll. It's useless. – bill_080 Oct 03 '11 at 19:30
  • @bill Please make your case in a reply. You have good points to make, but they are buried in these comments. If you feel further discussion is needed to clarify the question, chat would be a good medium to facilitate that. – whuber Oct 03 '11 at 20:05

4 Answers4

9

The claim that the margin of error is $4.9$% follows from assuming that the poll was conducted as if a box had been filled with tickets--one for each member of the entire population (of "hardcore Republican voters")--thoroughly mixed, $400$ of those were blindly taken out, and each of the associated $400$ voters had written complete answers to all the poll questions on their tickets. These $400$ poll results are the "sample."

The "as if" raises plenty of practical questions that go to whether the poll really can be viewed as arising in such a way. (Can we really think of the population as represented by a definite set of tickets? Is it fair to assume all tickets are completely filled out? Was the sampling conducted in a manner akin to drawing from a thoroughly mixed box? Etc.) Other respondents have listed some of those questions. Granting, however, that this is an adequate model of the poll leads us to the crux of the question: to what extent do these $400$ tickets represent the entire population? We never know for sure, but we can develop some expectations by studying this process of sampling from a box of tickets.

To do this, we focus on one question at a time. We might as well view each ticket as bearing either the "yes" or "no" answer for that question. We now compare the true survey results (that is, the true proportions of yeses among all tickets in the box) to the results of the myriad possible samples of $400$ tickets. (There are more than $1.9 \times 10^{1475}$ such samples.) We have to make the comparison for any possible true proportion, but even so, it's merely a matter of mathematical calculation. This calculation shows that the observed response in at least $95$% of all such samples lies within $\pm 4.9$% of the population value no matter what that population value might be. For example, if exactly $50$% of the tickets in the box are "yes," then $95$% of the possible samples of $400$ tickets will contain between $50-4.9$% = $45.1$% and $50+4.9$% = $54.9$ yeses.

(That computed value of $95$% actually depends on the true proportion of yeses in the population: if that proportion is very small or very large, we find that quite a bit more than $95$% of all samples will give results accurate to within the margin of error. A true proportion of $50$% is the worst case, which is used because we don't know the true proportion!)

This is all the margin of error means. Because $95$% is a substantial fraction of all possible samples, we feel it's highly likely that the one sample that was actually obtained will be among these $95$%. A doubter is allowed to suppose the sample could be one of the remaining $5$%: we cannot prove him wrong (based only on the poll results, anyway). Yet, similar calculations show (for instance) that the proportion of yeses will differ from the true proportion by more than $12.2$% in only one of every million possible samples. It's still possible the poll is among these one-in-a-million samples, but we have very shaky grounds to believe that. Thus, there is usually a limit to what constitutes a "reasonable" amount of doubt about what the true proportion may be, and it's rarely as extreme as $\pm 100$%.

The fundamental insight afforded by these calculations is that once the number of tickets in the box becomes moderately large (a few thousand in this case), the margin of error does not depend on how many tickets are in the box. It should be intuitively clear that the only thing that really matters for a relatively small sample is the proportion of yeses in the box, because the proportion determines the chance of drawing a "yes" or "no" and that proportion doesn't appreciably change between drawing the first and drawing the last of the $400$ tickets.

In summary, assuming it's accurate to view the poll as acting like drawing tickets from a box, our right to "extrapolate" from the poll to the population (a process more formally known as statistical inference) is an uncertain one, because we can always be wrong; but when the sample is just a small fraction of the population, the amount by which we might be in error in making that extrapolation depends primarily on the size of the sample, not the size of the population. This is why most credible polls, whether of local or international scope, use samples of a few hundred to a few thousand. It is rare that larger samples are needed to achieve a high chance of getting reasonable accuracy.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • +1. Another good mathematical exposition is here: http://terrytao.wordpress.com/2008/10/10/small-samples-and-the-margin-of-error/ – ShreevatsaR Oct 04 '11 at 06:32
3

I won't try to deliver my own answer, but I would refer you to the "What Is a Survey?" booklet compiled by the Survey Research Methods Section of the American Statistical Association. (Fritz Scheuren endorsing it on the title page is a former President of ASA from about five years ago. He used to be a high profile statistician in federal agencies such as the Social Security Administration and Internal Revenue Service, and now semi-retired from government to continue working as a VP of the National Opinion Research Center at University of Chicago.) The booklet delivers a clear and concise explanation of when and why you can, or can not, extrapolate the survey findings to the target population.

StasK
  • 29,235
  • 2
  • 80
  • 165
2

To answer your question:

It is possible to extrapolate from a sample of 400 to the views of all 700,000. This is contingent on the sample being random. Statistical Power is the topic you'd want to look into to confirm this. If I ask 400 of my closest friends, this doesn't work. To get a truly random sample, I'd have to get the list of all 700,000 people, and use a random number generator to pick 400 from it. Even so, there might be some selection biases. For example, if we're only calling landline telephones, then young people (who often only have cell phones) would be under represented in the sample. It's still possible to correct for these issues, but you have to be pretty careful.

Nate Silverstein's blog has some really good posts on the reliability of different polling firms, problems with their techniques, and correct inference for US political polls.

John Doucette
  • 2,113
  • 1
  • 15
  • 24
  • I think the key question here is, as you point out, how they performed the sampling. It's a fairly narrow target they are pursuing. – Jonathan Oct 04 '11 at 17:01
0

The short answer is yes, you can extrapolate.

Longer answer: The key question is whether the pollsters took a random sample of a population. They claim to have taken a random sample of Republican primary voters. But this is difficult. People refuse to answer polls, or they aren't home or other things can go wrong; even worse, the people who answer are not a random sample of the whole population (for instance, younger people are less likely to have land line telephones). Most pollsters therefore try to weight the sample they get to match a known population. Exit polls of Republican primaries give good estimates of various traits of this population.

Reputable pollsters (such as PPP) try hard to do this in a balanced way.

So, can you extrapolate from a relatively small sample to a large population? Yes you can, but there are some caveats.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • 1
    Does the firm conducting the survey adhere to the MRA ethical guidelines for surveys? If so, you should be able to see the exact question wording. Question wording is very important for topics like this. If not, the survey is likely garbage designed to make interesting news OR shock people into giving money. – zbicyclist Sep 15 '16 at 01:26
  • Another issue with the polls are that in the case of elections they represent a opinion at one point in time but the intent is to predict the future result of the election. Events that occur between the time the poll was taken and the time the election is held can influence opinion and change the results. – Michael R. Chernick Nov 28 '16 at 00:42