Probability of independent events given the history

Question

I am having some trouble understanding something that feels very basic in probability theory concerning the past history of repeated independent events.

Consider the tossing of a fair coin. $P(\text{head}) = P(\text{tail}) = 0.5$. Repeated tosses are independent events. Let's say that we already have four Heads in a row.

Here's my dilemma:

The probability of the next toss result being head = $0.5$, if I think of this as an independent event. But the probability of getting five heads in a row = $0.5^5$ given the history of the tossing.

Aren't these statements conflicting? Does the probability of the next toss being a head really differ? So, if were to bet on the outcome, would it be any better to bet on tails?

I can think of rationales for both cases but I can't wrap my mind around how both probabilities for a head can co-exist. An explanation would be deeply appreciated.

Your coin doesn't know anything about the previous flips. It's a 50% chance regardless of what has already happened. — bill_080, Dec 02 '12 at 22:36

score 5 · Answer 1 · answered Dec 02 '12 at 21:18

The second statement is simply wrong.

$P(HHHHH | HHHH) = 0.5$, not $0.5^5$. (Here I mean the probability of $5$ heads in a row, given that the first $4$ are heads. Gung is using different notation.)

The probability of $A$ given $B$ is $P(A|B) = \frac{P(A \text{and} B)}{P(B)}$ which in this case would be $\frac{0.5^5}{0.5^4} = 0.5$. See conditional probability. Perhaps you have it confused with $P(A~\text{and}~B)$.

gung - Reinstate Monica · Answer 2 · 2019-02-16T18:21:41.777

Flawed human intuitions:
This is a very common and pernicious confusion. You can read about this under the Wikipedia entry for the Gambler's Fallacy. Psychologists have also studied this phenomenon. Amos Tversky and Daniel Kahneman documenting it in their famous paper Belief in the law of small numbers (the title plays on the law of large numbers in statistics, discussed below). Theoretical work on cognitive mechanisms that help to produce this fallacy has been done by Ruma Falk and Clifford Konold (see, e.g., their paper, Making sense of randomness: Implicit encoding as a basis for judgment; more citations here).

Runs:
When you notice several heads in a row, you are perceiving a run. The (perfectly intuitive) belief is that runs are unlikely, thus, either the coin must not be fair, or it must revert to tails soon. Indeed, this intuition has been formalized by statisticians into a test for randomness / independence (i.e., the runs test). One thing to realize is that with lots of flips (a long series), runs of length 4 (for example) are actually quite common. Here is a quick simulation I ran to check how often I would see 4 or more of the same result in a row, given series of Bernoulli trials of lengths 20 and 50:

isRun = function(x){
  runL = 1
  maxR = 1
             # we iterate through the length of the series
  for(i in 2:lx){     
             # this increments the run length if the result is the same, 
             # but restarts the counter otherwise
    runL = ifelse(x[i]-x[i-1]==0, runL+1, runL<-1)  
             # if the current run length is longer than the previous max, 
             # the new value is used
    maxR = ifelse(runL>maxR, runL, maxR)
  }
  return(maxR)
}

r4.20 = c()                           # these will store the results
r4.50 = c()

set.seed(1)                           # this makes the code reproducible
for(i in 1:10000){
  x20 = rbinom(20, size=1, prob=.5)   # we generate series of length 20 & 50
  x50 = rbinom(50, size=1, prob=.5)
  r4.20[i] = ifelse(isRun(x20)>3,1,0) # if the maximum run length is 4 or longer
  r4.50[i] = ifelse(isRun(x50)>3,1,0)
}
mean(r4.20)     # [1] 0.7656          # ~77% of series
mean(r4.50)     # [1] 0.9796          # ~98%

But what if you've only flipped your coin 4 times (thus far)? The probability of getting the same result 4 times is $.5^4=.0625$. Given that people flip coins commonly, this should happen quite often (more than one time in twenty).

Convergence to long run probability:
What about the fact that the number of heads in your series should converge to half the length of the series? This is true; it is guaranteed by the law of large numbers. The relative proportion is likely to converge fairly quickly (for example, there is a 95% probability that the percentage will be within 2 standard errors of the true probability, $\pi$, where $$ S.E.(p) = \sqrt{\pi(1-\pi)/N}. $$ Thus, when the true probability is .5, and $N=5$, 95% of the time the proportion of heads should fall within $.5\pm 2\times .5/\sqrt{5} = .5\pm 2\times .224 = (.052,.948)$, and with $N=100, (.4,.6)$. (Actually, the normal approximation is imperfect the first case, because the N is small.) However, it will still fall outside of that interval 5% of the time. Importantly, although the series will converge to .5, there's no guarantee until you 'reach' infinity. In addition, the convergence is due as much to the growing denominator as it is to the numerator being $.5\times N$; that is, the number of heads can be very far from half in raw numbers, but close as a proportion of the total.

Random variables vs. Realized values:
While it is helpful to understand something about the intuitions that lead us astray and the true mathematical properties that govern these phenomena, the key concept is understanding the distinction between random variables and realized values. When you have a coin balanced on your thumb about to be flipped 5 times in a row, those outcomes are random variables, and the laws of probability apply to how they will behave in the long run*. When the coin is laying on your forearm with one side facing up (whether you have yet seen which side or not), that outcome is a realized value. The laws of probability don't make impossible what has already happened (nor could they). Thus, $Pr(H)=.5$, and $Pr(H|HHHH)=.5$ as well, because the four H's on the right side of the vertical bar (the given 4 prior outcomes) are realized values, not random variables, and are not related to the probability that the outcome of the next flip will be a head (at least under independence; with dependent data, the prior result must be a part of, or stored somehow within, the data generating process). Likewise, $Pr(HHHHH)=.03125$, and $Pr(HHHHH|HHHH)=.03125$.

I'll acknowledge that this still isn't necessarily very intuitive; you have millennia of evolution to overcome. Nonetheless, I have found that these considerations have helped me, and others, to think about randomness more clearly.

*Note that this discussion pertains to the Frequentist conception of probability.

CarbonFlambe · Answer 3 · 2019-02-16T04:58:47.793

The question can be simplified by considering just two coin flips. The result of flips 1 and 2 are $r_1$ and $r_2$. There are perhaps three quantities which are lurking in the background of the question and need to be clearly distinguished.

The marginal probability of the second toss coming up heads: \begin{equation} \mathrm{prob}(r_2 = H | \mathcal{I}) = 0.5 \end{equation}

The conditional probability of the second toss coming up heads, given that the first comes up heads: \begin{equation} \mathrm{prob}(r_2 = H | r_1 = H, \mathcal{I}) = \mathrm{prob}(r_2 = H | \mathcal{I}) = 0.5 \end{equation}

The joint probability of the first and second toss coming up heads: \begin{align} \mathrm{prob}(r_2 = H, r_1 = H | \mathcal{I}) &= \mathrm{prob}(r_2 = H | r_1 = H, \mathcal{I}) \: \mathrm{prob}(r_1 = H | \mathcal{I}) \\ &= \mathrm{prob}(r_2 = H | \mathcal{I}) \: \mathrm{prob}(r_1 = H | \mathcal{I}) \\ &= 0.5 \cdot 0.5 = 0.25 \end{align}

score 0 · Answer 4 · answered Dec 02 '12 at 18:12

The notions of independence in the presence of information that is being accumulated over time can be adequately captured through sigma-algebras. If the name sigma algebra sounds daunting, consider it as an information set at an instant of time.

For example, let $X_i$ denote 1 or 0 respectively based on whether the $i^{th}$ toss results in a head or tail. The sigma algebra $\mathcal{F}_0$ is a trivial sigma algebra as there is no information prior to the experiment. As time progresses, we perform the experiment and build information. For example, while $X_4$ is the random variable which is 1 or 0 depending on the outcome of the fourth toss, $X_4|\mathcal{F}_4$ is a constant - 0 or 1 depending on what you got in your fourth toss. Similarly $X_5|\mathcal{F}_4=X_5$ is a random variable as the outcome $X_5$ is independent of the information contained in $\mathcal{F}_4$.

For your query, the outcome prior to the experiment, $P(X_1=1, X_2=1,\ldots X_5=1|\mathcal{F}_0)=E(X_1X_2X_3X_4X_5|\mathcal{F}_0)=E(X_1X_2X_3X_4X_5)=\frac{1}{2^5}$.

After four tosses, we have to look at a different probability: $P(X_5=1|\mathcal{F}_4)=E(X_5|\mathcal{F}_4)=E(X_5)=0.5$. The penultimate equality comes about from the independence of $X_5$ from prior tosses.

While the response makes perfect sense to me mathematically, I am not sure I understand the difference between P(X1=1,...|F0) and P(X5|F4) from a real-world point of view.The fifth toss being a head is just a single event. It still sounds like the probability of the event is a constants 0.5 (due to independence) but it is also (.5)^5 when I see it as five heads occurring in a row. Can you explain the intuition in the difference there? — Aravind, Dec 02 '12 at 18:56
@Aravind: The main difference is the probability you are concerned with. It is 0.5^5 when you look at $P(X1=1,X2=1... X5=1)$ and 0.5 when you look at $P(X5=1)$. In the latter case you have summed over all other $X$s, i.e., $P(X5=1)=\sum_{x1,x2,x3,x4}P(X1=x1,X2=x2... X5=1)$. — Bravo, Dec 02 '12 at 19:47

Probability of independent events given the history

4 Answers4

Linked