2

Suppose I am 50 years old and a study found that in my city 10 people (CI=4 people) die at 50.

Now it's July and I have a very malignant disseminated cancer and this year 14 people died at age 50 in my city, Can I then be sure that i will not die the rest of the year?

Clarification: If (the mean=10 people) and 95% CI is 4 people , then it's improbable (less than 2.5%) that the city will have deaths more than 14 people. Because it's rare that my city have more than 14 people dying ,then it's improbable that more than 14 deaths will occur. Now they already died, and I am from the city. I shouldn't die or my city will be in a very rare situation. That's my misunderstanding.

Another example:
I studied patients in the internal medicine department retrospectively. I found that 1% of cases have a rare very difficult to diagnose disease. After the study ended, I will go to the department which always has 200 patients and manually exclude e.g. 180 of them as they don't have that rare disease. Can I then be sure that in the rest (20 patients) there will be 2 cases having that rare very difficult to diagnose disease?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
Elmahy
  • 467
  • 1
  • 3
  • 12
  • 1
    If you have malignant cancer you shouldn't be basing any conclusion as to your own life expectancy on overall mortality rates. – dsaxton Jan 19 '16 at 17:19
  • I sincerely hope that no application of statistics will cause anyone to catch any disease, rare or not! – whuber Jan 19 '16 at 17:52
  • @whuber It's not to cause it, but to catch it and treat people early before complications. e.g. ovarian tumor can't always be discovered until it's too late to treat patients. – Elmahy Jan 19 '16 at 18:25
  • The firms like 23andme are trying to do something along these lines but based on your DNA sample. FDA doesn't let them do too much yet, but they're getting there slowly. – Aksakal Jan 20 '16 at 03:03

2 Answers2

2

"14 people died already so we shouldn't see any more" is simply a (very slightly) more sophisticated version of the gamblers fallacy.

This is essentially a confusion between a marginal and a conditional probability. The probability that you see more than 14 when you haven't seen any yet may be low, but the probability that you see at least one more when you already have 14 may be quite high.

Consider (as a rough model) that ten thousand people each have p=1/1000 of dying from this disease in one year (occuring randomly over the year), so that in a year you expect 10 people to die with the condition (there's about a 92% chance that you'll see no more than 14).

Now, however, imagine that in the first seven months of the year you got 14 deaths. You should not expect any more deaths, right? Well, no, at this point you should expect about 4 additional deaths in the remaining 5 months.

Indeed, having so many in 7 months should also make you question whether the p=1/1000 is correct (or whether the size of your target population might have been off), and in that case, the expected number of additional deaths may be higher still.

If you're considering the risk to a single person, their probability of dying from that disease in that remaining 5 months if you knew nothing about them other than that they were in the target population would be about $0.001 \times 5/12$ (assuming $p$ was correct).

(this discussion requires an odd assumption, essentially that death occurs suddenly, but rather than labor the point over the fact that it might take several months or even years -- let alone people who undergo remission -- we could cover the issue more simply by assuming that instead of death we were discussing diagnosis)

However in your discussion, you mention "I have a malignant disseminated cancer" ... that means you aren't a random person from the population. Such a diagnosis would certainly impact the chances that you die from cancer ($P(\text{dies from cancer}|\text{aged }50) \neq P(\text{dies from cancer }|\text{ aged }50\text{ and diagnosed with cancer})$

In your second example your question suggests the same error. A priori you expect 1% to have the rare condition, but if you eliminated 90% of them by randomly selecting those that are tested, the remainder would each still have a 1% chance of having the condition. 1% of 20 is 0.2. There's about an 82% chance none of them have the rare condition ($0.99^{20}$).

[If those that were eliminated were not chosen randomly (e.g. only the most healthy-looking were tested) then the calculation doesn't work; in that case you may indeed expect to have a good deal nearer to 2 in the remainder.]

Glen_b
  • 257,508
  • 32
  • 553
  • 939
1

No, in both cases.

For the first case: Let $X$ be the number of people dying per year in your city. The study found the expectation $E[X]$ = 10. This does not imply that the probability of $X$ being something else than 10, equals zero, i.e. $Pr[X \neq 10] = 0$ is most certainly wrong. Therefore it is also possible that $Pr[X = 16] > 0$.

Statistics can never make a sure statement about a single fate. They are only a summary of many fates and can give an idea of the probability of certain outcomes. However, making definite statements like the ones you made above is almost always impossible.

Denwid
  • 702
  • 5
  • 14
  • If (the mean=10 people) and 95% CI is 3 people , then it's improbable (less than 2.5%) that the city will have deaths more than 13 people. Because It's rare that my city have more than 13 people dying ,then it's improbable that more than 13 deaths will occur. now they already died. and Iam from the city. I shouldn't die or my city will be in a very rare situation.That's my misunderstanding. – Elmahy Jan 19 '16 at 18:19
  • Pr [X>13] is less than 2.5% ,i.e. The Pr[the city have > 13 deaths] is less than 2.5%. If i died the city will have more than 13 deaths (which is improbable)! – Elmahy Jan 19 '16 at 18:32
  • Yes, the probability is small (based on the past observations). But: You asked "can I be sure", and this you can not. Also, just because in the past a study found that E[X] = 10 with CI=3, doesn't mean that this will also hold for this year, because most likely X is non-stationary (at least in this case). There could for example be some environmental factors that make the cancer rate rise in this year. – Denwid Jan 19 '16 at 18:39