6

I was told an anecdote by someone today who was trying to prove a point regarding safety. They said "50 people have been living in [area a] for the past two years one year (apparently I remembered the conversation wrong) and there have been no incidents, therefore the area is safe for more people to live there."

[area a] happens to be what the government considers a high-risk zone, with an elevated threat to personal safety (specifically death). I know this person's reasoning is flawed but I'd like to know the exact name and explanation of that flaw, because I feel this is quite a common one.

I see two main factors contributing to the error:

  • Small sample size
  • The risk is heavily weighted on the "death" side of things, it's not an elevated risk of getting a paper cut

How would I call out this flaw despite the fact that the person is technically right in saying "There have been no incidents"?

Edit for clarity: This [area a] is equivalent to a building, and is occupied by more than just this sample set. The area is within a larger region in which there is an elevated risk of harm or death, and the area offers no special protection against it. Incidents of risk are rare, but certainly higher than the background rate and do occur in this larger region.

thanby
  • 163
  • 1
  • 7
  • 2
    Sounds like the law of small numbers http://pirate.shu.edu/~hovancjo/exp_read/tversky.htm – Jeremy Miles Sep 02 '15 at 19:40
  • 4
    I don't see any error *per se*: this appears to be a well-founded attempt to reason with data, albeit perhaps with a small dataset (which is not an error in itself). But are those 50 people just those your interlocutor happens to know or are they a complete census of all the people living in the area? – whuber Sep 02 '15 at 19:41
  • 1
    Just the ones he happens to know, in a much larger region where incidents *do* occur. – thanby Sep 02 '15 at 19:43
  • And for all I know other incidents do occur within [area a] that just don't affect this sample set. – thanby Sep 02 '15 at 19:44
  • 5
    I call this fallacy "the unconvincing argument fallacy". The argument is too vague to be wrong. It's just not convincing. – zkurtz Sep 02 '15 at 19:49
  • "Unconvincing argument" is exactly why I brought it here :) I'm hoping there's a more scientific name /explanation to go along with it. – thanby Sep 02 '15 at 19:51
  • 6
    This reminds me a joke. The "argument" of an imaginary criminal in court is: "I can bring 50 witnesses that did not see what you claim I did." – Vladislavs Dovgalecs Sep 02 '15 at 20:56
  • 1
    @thanby: I studied mathematics, and I've never fully understood the need people have to taxonomise incorrect arguments. It follows or it doesn't ;-) But sure, if lots of people make the exact same mistake then there should be a name for it, whether it's "flipping the sign while copying" or "anecdotal fallacy". – Steve Jessop Sep 02 '15 at 22:18
  • I'd just call it "invalid sampling". ... Or "This is how we got the 'Damn Lies and Statistics' correlation". – Brock Adams Sep 03 '15 at 00:53
  • Related ELU: [belief it won't happen because it never has?](http://english.stackexchange.com/questions/145403/is-there-are-term-for-when-you-believe-that-because-something-hasnt-happened-i) *false analogy/generalization* – Mazura Sep 03 '15 at 06:00
  • See also, [wiki/Faulty_generalization](https://en.wikipedia.org/wiki/Faulty_generalization) *fallacy of defective induction* – Mazura Sep 03 '15 at 06:14
  • 3
    Obligatory xkcd reference: [what-if no 27](https://what-if.xkcd.com/27/). Did you know that only 93% of humans who have ever lived have actually died? That means there's a 7% chance of being immortal, right...? Even better if you're a member of The Beatles - only 50% of them have died... – AndyT Sep 03 '15 at 14:05
  • 1
    You need to be more specific about what the danger is because, at the moment, it's not even clear how the argument is wrong. For example, if the risk is catastrophes (e.g., earthquakes), the argument is wrong because catastrophes happen less often than every two years. If the risk is contamination that causes, e.g., cancer, then the argument is wrong because two years isn't long enough for cancers to form. On the other hand, if the risk is something like bears killing people, then no incidents in two years is pretty good evidence that the place is safe. – David Richerby Sep 03 '15 at 17:53
  • Nate Silver makes this point in his book The Signal and The Noise. If an area has historically had an earthquake on average once every 35 years but it hasn't had one for 40, this doesn't mean it's going to happen tomorrow or that it isn't, nor does it change the statistic. As many others have pointed out. – Dave Kanter Sep 03 '15 at 18:53
  • I wonder if this is about C8 – shadowtalker Sep 04 '15 at 04:44
  • As far as I can tell, this is just plain old non sequitur: his conclusion just doesn't logically follow from his data. – Davor Sep 04 '15 at 07:16
  • @DavidRicherby The risk is closer to "bears killing people" and in the larger region it does happen multiple times per year, it just hasn't happened in this small area. Per the beautiful xkcd logic AndyT pointed out, that must mean everyone who lives in this area is immortal. – thanby Sep 04 '15 at 09:42

8 Answers8

18

I don't have a specific name for the fallacy, but here is a reference that I think is relevant (along the law of small numbers line):

The Most Dangerous Equation

Also a statistical rule of thumb (see section 2.9) says that an approximate 95% confidence interval for the 2 year incidence rate given none in 2 years would be from 0 to $\frac{3}{50}$, so the incidence could be as high as 6%. So if you moved another 1,000 people in then it would not be surprising to see 60 incidences in the next 2 years.

Thinking about it more, if the small area was chosen because of no incidences and there are some in the larger area, then this would be a variation on the Texas Sharpshooter Fallacy.

Greg Snow
  • 46,563
  • 2
  • 90
  • 159
  • 4
    In case the "Most Dangerous Equation" link goes dead, it states that small samples show more variability so you are more likely to get a more extreme result ("very safe" or "very dangerous") from looking only at a small area. I'm sure there ought to be a name for this phenomenon, but I can't think of it. – Silverfish Sep 03 '15 at 00:56
  • 1
    It seems that some people call it the Sample Size Fallacy or Small Sample Fallacy: http://www.oxfordreference.com/view/10.1093/oi/authority.20110803100439475 – Flounderer Sep 03 '15 at 02:30
  • 1
    Also worth pointing out that this is the basis for [funnel plots](https://en.wikipedia.org/wiki/Funnel_plot), which show the increased variability in smaller samples. – Silverfish Sep 03 '15 at 10:05
  • Oddly enough when challenged on the subject the person replied that [area a] is safer than [area b] (which is a short distance away) because [area b] actually had a pretty large incident in the past decade, so I think the Texas Sharpshooter Fallacy does apply somewhat, though it wasn't their original argument – thanby Sep 05 '15 at 11:05
13

The plural of "anecdote" is not "data."

(Also quoted at https://stats.stackexchange.com/a/8404.)

whuber
  • 281,159
  • 54
  • 637
  • 1,101
5

It also sounds like the parable of the thanksgiving turkey:

http://www.businessinsider.com/nassim-talebs-black-swan-thanksgiving-turkey-2014-11

Every morning the farmer feeds the turkey well. After 1000 days the turkey argues that the farmer is benevolent and the pattern will continue. But day 1001 is Thanksgiving...

(Note for global readers: Thanksgiving is a US holiday on which it's customary to eat turkey.)

wonder
  • 51
  • 2
  • 8
    Can you explain "the parable of the thanksgiving turkey" (eg, in case the link goes dead)? – gung - Reinstate Monica Sep 03 '15 at 00:46
  • 1
    That "black swan" argument may be the best counterpoint to the assumption of safety, because as Nassim describes, one single incident would compromise the entire assumption (which is a pretty big deal when you're talking about human lives). – thanby Sep 04 '15 at 09:49
5

This is not a fallacy, but rather the Problem of induction, as popularized by David Hume.

Hugh
  • 589
  • 3
  • 15
4

General case of survivors fallacy:

Looking only at/for things that didn't fail skews your perception. This may lead you into an untested and thus failure intolerant behaviour.

The usual example is observing planes returning from air combat: "Do you need to increase armor in places where the returning planes were hit?" Supposedly it's where planes are likely to be hit.

However the answer is counter-intuitively "No, because that's where planes are likely to be hit and survive." So hits there are survivable anyway.

You achieve real results, when you increase armor in places the "survivors" have not been hit, because that's where the "non-survivors" were hit.

For your case (singular):

Under the precondition of moving a single person into an area with incidents leading to deaths. Do I need to move into a sub-area that has not been hit by an incident?

No, for those sub-areas you simply have no conclusive data.

Instead you need to move into a sub-area where incidents do happen but don't lead to deaths. The goal is not to have no incident but to survive it, in case it happens, right?

If you don't want the incident to happen, you shouldn't move into the larger area in the first place!

For your case (plural):

If you want to move a statistically relevant number of people into the area where incidents are survivable, you need to first check, if the reason incidents are survivable is low population density in said area.

If incidents are survivable in low density population areas, moving people in wouldn't make the people safe but the area unsafe.

Another view on things:

If there are 1000 people in the larger area, of which 20 died in the last incident, then there are still 980 survivors left to tell the tale. Is it safe, because more people survived than died?

Surely most of the 980 people weren't even close to the 20 that died, when it happened. Does it become any safer, if you just ask those?

Can you ask the 20 dead people, if they'd still consider it safe?

Bottom line is, you'll feel safe as long as you ask survivors, who didn't witness the incident. Since you can only ask survivors, it's probable they didn't witness the incident.

Hence, Survivors fallacy.

Related fallacies:

Others have mentioned other fallacies. I don't want to repeat them in detail. However I do see that they apply as well. So here's a compilation and the aspects why they apply and why they are different:

  • Survivors fallacy: Concentrating on favourable results only.
  • Texas Sharpshooter fallacy: Choosing a sub-sample in hindsight.
  • Hot hand fallacy: Interpreting random variation of results as indication of probability distribution, especially when looking at most recent history.
  • Small numbers law: Relying on insufficient data.
  • Base rate fallacy: Underestimating the importance of general information in favour of more specific information.

There's another well-known fallacy that I originally mistook for "Hot hand". Now that I think about it, it actually doesn't apply:

  • Gambler's fallacy: Misunderstanding the law of large numbers to mean that independent events would even out in the long run.

It's kind of inverted Hot hand fallacy: Falling for "Hot hand" you'd bet on what happened most often in recent history, because it seems more likely.
Falling for "Gambler" you'd bet against what happened most often, because the opposite seems in need to even out in the long run.

NoAnswer
  • 56
  • 2
  • I like your summary at the bottom, but that's not what the gambler's fallacy says. The gambler's fallacy is the idea that future samples have a tendency to compensate for (variations from expected values of) past samples. – Neil G Sep 04 '15 at 06:12
  • Thanks for the thorough answer. The one that really made me think was the "survivor's fallacy" because in this case that actually applies. The person making the assumption doesn't personally know anyone who's been connected to a victim (the overall incident rate is still small, it's just much higher than average for a larger geographical region), so I think that's clouding their judgment to some extent. – thanby Sep 04 '15 at 09:53
3

This sounds like the hot hand fallacy to me.

https://en.wikipedia.org/wiki/Hot-hand_fallacy

When teaching intro stats I found a lot of students fell for this fallacy. So the idea is in basketball sense, he made X amount of shots he is more likely to make the X + 1 shot. Same idea here X amount of people live here with no incidents therefore no incidents should occur if X + 1 people are present.

Lauren Goodwin
  • 561
  • 3
  • 10
  • 4
    This needs to be phrased very carefully. There is no fallacy in the belief that a basketball shot is more likely to be successful if the shooter has made their last $X$ shots than if they'd just missed their last $X$ shots: that's just saying that good players make more shots than bad players. The fallacy is that believing that a specific player, who makes shots with probability $p$ will make their next shot with probability greater than $p$ if they've made their previous $X$ shots; it turns out that successive shots by a given player are close to independent. – David Richerby Sep 03 '15 at 18:07
  • 3
    The Wikipedia page needs to be updated. There's some pretty good evidence that there is a reason to believe in streakiness now. Gelman stays up on it so you can check his blog. – John Sep 03 '15 at 22:16
  • 3
    @John Interesting. I must admit that I was a little skeptical of even the correctly phrased fallacy: surely every player has good days and bad days and having just made a streak makes it less likely that the player being observed is having a bad day. – David Richerby Sep 04 '15 at 08:13
  • I can say that independence of success over time is not necessarily true. The event of my "first serve being in" in tennis is highly positively autocorrelated. This would have a severe effect on the probability of double faulting if using only "first" serves, even as second serves. Based on independence, probability of double fault = 1 - (1-p)^2, where p is the probability of a serve being in. Positive autocorrelation makes the actual probability of double faulting using only "first" serves much higher. Being in the groove can be a very real phenomenon in sports, and in other endeavors. – Mark L. Stone Sep 04 '15 at 14:44
  • I've been thinking about this and while it may play a part in the person's initial assessment I don't think it's the complete answer. The assumption is that safety is guaranteed for x->infinity not just x+1 – thanby Sep 05 '15 at 11:02
2

This is the base rate fallacy:

If presented with related base rate information (i.e. generic, general information) and specific information (information only pertaining to a certain case), the mind tends to ignore the former and focus on the latter.

In this case, the base rate of death is quite high, but the specific information is that there are at least 50 people living in the area who have been unharmed.

shadowtalker
  • 11,395
  • 3
  • 49
  • 109
  • That's a good logical point but I'd almost call it a double-base-rate, because the base rate for the larger region is still low compared to the population, but it's much higher than the base rate for the rest of the world (I'm simplifying a bit for the sake of comment length but you get the idea). – thanby Sep 04 '15 at 09:56
  • @thanby maybe, but that depends on what you define as your "base." It's about confusing marginal and conditional distributions. I'm also stretching the definition a little more than I realized when I first posted this. – shadowtalker Sep 04 '15 at 19:23
0

Statistical inference becomes invalid when there is no variability -and in this case, the variability is non-existent. So the only way that the argument:

"50 people have been living in [area a] for the past two years and there have been no incidents, therefore the area is safe for more people to live there."

can be examined, is non-statistical, i.e. deterministic. Therefore the argument is methodologically valid (not factually correct) only if it is read as

"50 people have been living in [area a] for the past two years and there have been no incidents, therefore the incident rate in the area is and will remain zero."

Wow. I am impressed with the confidence level of the person saying this.

Any implied inference of the type "if the rate is zero in the sample, we expect it to be "small/acceptable/"normal" in the population" (which is how one could understand the "it is safe to live there" assertion) is garbage, both because there is no base to extrapolate from sample to population, but also because there is no base to extrapolate from past/present to the future.

As Fisher would say, "get more data".

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
  • I wholeheartedly agree with your assessment. This person is indeed confident the incident rate will remain zero, and I'm also impressed (and a bit horrified) at their level of confidence. – thanby Sep 04 '15 at 10:07
  • 1
    But you can vert well, say, construct a confidence interval based on a binomial observation of zero. That is valid statistical inference without variation. So as stated your claim is invalid. – kjetil b halvorsen Sep 04 '15 at 13:27
  • 1
    @kjetilbhalvorsen [maybe](http://andrewgelman.com/wp-content/uploads/2014/09/fundamentalError.pdf) – shadowtalker Sep 04 '15 at 19:24