7

Here is a problem I thought of:

  • Suppose I am watching someone flip a fair coin. Each flip is completely independent from the previous flip.
  • I watch this person flip 3 consecutive heads.
  • I interrupt this person and make the following offer: If the next flip results in a "head", I will buy you a slice of pizza; if the next flip results in a "tail", you will buy me a slice of pizza.

My Question: Who has the better odds of winning?


I wrote the following simulation using the R programming language. In this simulation, a "coin" is flipped many times ("1" = HEAD, "0" = TAILS). We then count the percentage of times HEAD-HEAD-HEAD-HEAD appears compared to HEAD-HEAD-HEAD-TAILS:

#load library
library(stringr)

#define number of flips
n <- 10^7

#flip the coin many times
set.seed(1)
flips = sample(c(0,1), replace=TRUE, size=n)

#count the percent of times HEAD-HEAD-HEAD-HEAD appears 
str_count(paste(flips, collapse=""), '1111') / n

0.0333663

#count the percent tof times HEAD-HEAD-HEAD-TAIL appears
str_count(paste(flips, collapse=""), '1110') / n

0.0624983

From the above analysis, it appears as if the person's luck runs out: after 3 HEADS, there is a 3.33% chance that the next flip will be a HEAD compared to a 6.25% chance the next flip will not be a HEAD (i.e. TAILS).

Thus, could we conclude: Even though the probability of each flip is independent from the previous flip, it becomes statistically more advantageous to observe a sequence of HEADS and then bet the next flip will be a TAILS? Thus, the longer the sequence of HEADS you observe, the stronger the probability becomes of the sequence "breaking"?

Ben
  • 91,027
  • 3
  • 150
  • 376
stats_noob
  • 5,882
  • 1
  • 21
  • 42
  • I have edited your simulation to include ```set.seed``` so that this is a reproducible analysis; the numbers change very slightly from what you had, but the result is effectively the same. (In future, when you include simulations in a question, please set the seed so that other readers can reproduce your analysis with the same results.) – Ben Sep 12 '21 at 01:51
  • 1
    You say the coin tosses are independent, so the fourth toss can't depend on the thirst three. There has to be something wrong with your simulation, and @Demetri (+1) has explained what it is. – BruceET Sep 12 '21 at 02:00
  • @BruceET: I don't agree that Demitri's answer explains the problem; the problem here occurs in the faulty use of ```str_count``` to do the simulation (which fails to count overlapping occurrences of the pattern). – Ben Sep 12 '21 at 02:07
  • @Ben, ah apologies. – Demetri Pananos Sep 12 '21 at 02:10
  • 2
    Cross-posted: https://math.stackexchange.com/questions/4248011/flipping-coins-sequences-vs-independent-flips – Peter O. Sep 12 '21 at 03:42
  • 1
    If you get 5 heads in a row, do you expect the HHHH count to be? Because `str_count('11111', '1111')` will return 1, not 2. And `11110` will count as both a win for head and for tails – Hobo Sep 12 '21 at 01:51
  • 3
    I feel like both would have similar odds of winning with a fair coin as the actual result depends only on the final flip. – d.b Sep 12 '21 at 02:00

3 Answers3

13

By default, str_count does not count overlapping occurrances of the specified pattern. The substring 1111 can overlap with itself substantially, whereas the substring 1110 cannot overlap with itself. Consequently, your calculation for the first substring is substantially biased --- you are substantially undercounting the number of times this pattern actually occurs in your simulation. Try this alternative method instead:

#Flip the coin many times
set.seed(1)
n     <- 10^8
FLIPS <- sample(c(0,1), size = n, replace = TRUE)

#Count the proportion of occurrences of 1-1-1-1
PATTERN.1111 <- FLIPS[1:(n-3)]*FLIPS[2:(n-2)]*FLIPS[3:(n-1)]*FLIPS[4:n]
sum(PATTERN.1111)/n
[1] 0.06246614

#Count the proportion of occurrences of 1-1-1-0
PATTERN.1110 <- FLIPS[1:(n-3)]*FLIPS[2:(n-2)]*FLIPS[3:(n-1)]*(1-FLIPS[4:n])
sum(PATTERN.1110)/n
[1] 0.0624983

With this alternative simulation (which counts overlapping occurrences of the patterns) you get proportions for the two outcomes that are roughly the same. ​If the coin flips are in fact independent and "fair" then each player has the same probability of winning the wager. Mathematically, the true probability of any run of four outcomes is $1/2^4 = 0.0625$, so that it what the above simulations are effectively estimating; the remaining small disparity in the simulation is due to random error.

Ben
  • 91,027
  • 3
  • 150
  • 376
4

EDIT

The reason you are getting a different percentage for HHHH and HHHT is that you are calculating the instances of 1111 and 1110 in a very long string. you are not breaking these into blocks of 4. In a very long string it is more likely for you to have 3 ones in a row than it is to have 4 ones in a row. Since you aren't checking the groupings of 4 to make sure the flips are all in a single test, you will end up with more 1110 then you will 1111.

The correct way to code the problem is to group the coin flips into groups of 4. The following should be pretty easy to follow but is a bit slow.

#load library
library(stringr)

#define number of flips
n <- 100000

# Pre-assign a length of n to a data.frame
df <- data.frame(flip = character(n))

for(i in 1:n){
  df$flip[i] <- paste(sample(c(0,1),replace = TRUE,size = 4),collapse = "")
}

100*sum(df$flip == "1111")/n
# 6.259

100*sum(df$flip == "1110")/n
# 6.193

Original math based answer (missing code):

This is a common misinterpretation of statistics. Great example to learn from.

Your question was: After 3 coin flips, if I bet on the outcome of a 4th flip what is the probability of the 4th flip.

The 4th flip is now independent of the first 3 flips. There is no mechanism out there that grabs the coin and changes the probability of that 4th flip. The 4th flip will have a 50% chance of being heads, and a 50% chance of being tails.

Now, the question you are answering is: what is the probability a coin will be heads 4 times in a row. This is an entirely different question. The new question is asking what the probability is that you will get 4 heads in a row and this is a dependent question because not only does the 4th flip have to be heads, it depends on the first 3 having also been heads first. Then you have 16 possible combinations in 4 coin flips and only 1 possible way for it to come up with 4 heads (1/16 = 6.25%).

Adam Sampson
  • 249
  • 1
  • 5
2

Ah my friend, you are making a very simple mistake. In your simulation, you are computing the proportion of times a person could flip 4 heads in a row. But that is not what you have wagered.

You enter the bet having seen the three heads and have wagered only on the result of the next flip. Because each flip is independent, and the coin assumed fair, the probability of a heads is the same as a tails and hence the odds are even!

It would have been different had you made the wager at the beginning of the four flips. In such a case, we could just compute the binomial density. We would see that 4 heads in a row (conditioned on making only four flips) is very small and so you would have the better odds, again assuming the coin is fair. But having already seen the 3 flips and then wagering is akin to just betting on a coin flip.

Demetri Pananos
  • 24,380
  • 1
  • 36
  • 94
  • 1
    (-1) This answer does not seem to me to accurately diagnose the problem. Although the simulation is not identical to the wager, the pattern ```1111``` should still occur with the same marginal probability as ```1110``` in the model. – Ben Sep 12 '21 at 02:06
  • 1
    @Ben The question was not about the simulation, it was about the wager. I feel like I have appropriately addressed the titular question bolded in the post. – Demetri Pananos Sep 12 '21 at 02:08
  • I don't think I agree with that view of the question. My reading is that he is confused by the fact that the simulation shows ```1111``` to be less likely than ```1110``` (due to his coding error), and he is seeking guidance on how to reconcile that with the assumption of independence. *If it were true* that ```1111``` is less likely than ```1110``` then surely that would indeed impact the solution to the wager (so that he would be correct that it is better to bet on tails). – Ben Sep 12 '21 at 02:12
  • @Ben We can agree to disagree on the purpose of the question, but I address the *actual* question (literally titled **My Question**). I think a combination of my answer and yours is sufficient to answer any confusion OP has, so if OP wants he can accept yours. – Demetri Pananos Sep 12 '21 at 02:15
  • My concern with this answer is that it gives the misleading impression that if you just correct for conditionality (instead of looking at marginal outcomes) then you will get equal probabilities for the last outcome. However, if the OP were to do this with his simulations he would still get wildly different conditional probabilities (due to his error). – Ben Sep 12 '21 at 02:21
  • @Ben I think there are larger problems with OPs simulation than the one you point you. His question is about a sequence of 4 flips, where as the simulation for some reason looks at a sequence of a million flips. And granted, since flips are independent it shouldn't matter much, but it signals to me that OP has made a more basic error somewhere in the construction of the problem in addition to the implementation of the solution. In any case, let OP decide which is a more useful answer top him. I'm going to leave my answer up for posterity – Demetri Pananos Sep 12 '21 at 02:24
  • That is fine, and obviously other readers can make their own judgment on the matter; I just commented because I prefer to give a reason when I give a downvote rather than it being a mystery. – Ben Sep 12 '21 at 02:28
  • 1
    @Ben I appreciate the transparency – Demetri Pananos Sep 12 '21 at 02:28
  • And for what it's worth, I've just had a look at some of your other answers to upvote, as compensation. ; ) – Ben Sep 12 '21 at 02:37
  • 2
    @Ben I don't think that is necessary, please don't feel obligated to compensate for what is a genuine and respectful disagreement. Though thanks, the fake internet points make me feel warm and fuzzy – Demetri Pananos Sep 12 '21 at 02:39