0

My background the past while has been project management so I haven't really had much work with stats the past decade or so, so please forgive the terminology if it is wrong.

I'm working on a problem involving successive runs. E.g. if I flip a coin 100 times, what is the probability that I will flip heads at least 5 times in a row. I've been poking around trying to figure this out, and what I thought the solution would be is 1 minus the probability of flipping heads once, plus the probability of flipping heads twice, plus the probability of flipping heads three times, plus the probability of flipping heads four times, all in a row:

$$ p=1-\left(\left(\frac{1}{2}\right)^1 + \left(\frac{1}{2}\right)^2 + \left(\frac{1}{2}\right)^3+ \left(\frac{1}{2}\right)^4\right) $$

My thinking being that for me to have a run of five or more is the opposite of having a run of 1, 2, 3, or 4. When I do the math on the above, I get a result of 3.13%.

This number feels too low to me. Am I correct in my thinking that the probability of five or more is just the "opposite probability" of a run of 1, 2, 3, or 4?

Thank you.

tendim
  • 101
  • 2
    This is a duplicate of a previous question [here](https://stats.stackexchange.com/questions/362470/362604#362604). The correct probability is $p=0.8101096$. – Ben Sep 20 '20 at 06:38

1 Answers1

0

Runs of length five or more in tossing a fair coin are fairly common.

Let's look at one 100-toss experiment, letting 0 = Tail and 1 = Head

 set.seed(919)
 x = rbinom(100, 1, .5);  x
  [1] 1 1 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1
 [26] 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 1
 [51] 0 1 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 1 1 1 1
 [76] 0 1 1 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 1 1 1 0 1 0

There seem to be several runs of 5 or more. In R, the rle procedure counts lengths and values (H or T) of runs.

rle(x)
Run Length Encoding
  lengths: int [1:44] 3 2 1 2 1 5 2 8 3 2 ...
  values : int [1:44] 1 0 1 0 1 0 1 0 1 0 ...

So it is easy to count the number of runs of five or more.

sum(rle(x)$length >= 5)
[1] 5

Now look at 100,000 experiments with 100 tosses, letting nr.5 be the number of runs of at least length five in each experiment.

set.seed(2010)
nr.5 = replicate(10^5, sum(rle(rbinom(100,1,.6))$len >= 5))
table(nr.5)
table(nr.5)/10000
nr.5
     0      1      2      3      4      5 
0.1068 0.6251 1.6115 2.4886 2.4166 1.6359 
     6      7      8      9     10 
0.7757 0.2613 0.0663 0.0112 0.0010
mean(nr.5)
[1] 3.62843

So, on average there are about 3.6 runs of length 5 or more in 100 tosses. Presumably, on average, about 1.8 of them are runs of Heads.

Somewhat more analytically, the probability of getting five Heads in a particular 5-toss sequence is $1/2^5 = 1/32.$ So the (geometric) waiting time for the first run of at least five will be about 32 tosses, and we would not be surprised to see at least one such run in most 100-toss experiments.

[For an exact analytic result you'll have to make careful definitions to deal with the fact that not all long runs are of length 5. Does a run of ten Heads count as one run of more than 5? Or two runs of 5? How many runs of five are there in a run of seven?]

This Wikipedia article has formulas for run lengths and numbers of runs.

BruceET
  • 47,896
  • 2
  • 28
  • 76