3

Suppose that a very long Bernoulli process gives a sequence with possible values: $A$ with probability $p$, and $B$ with probability $1-p$. The expected fraction of contiguous sequences of length $k$ that only contain $A$s must be $p^k$.

I made some numerical experiments and found that, recording only the length of the entire sequences of $A$ contiguous (e.g., in $\dots BAAAB\dots$ only write down "one sequence of length 3") and counting them, the frequency of those lengths also have an exponential dependence on the sequences length.

Why is it happening and how those exponential are related?

user1420303
  • 527
  • 3
  • 8
  • What does "exponential dependence" means? There should not be any regular distribution, i.e., recall gambler's fallacy, maybe looking into noise? – msuzen Sep 06 '21 at 15:04
  • @MehmetSuzen I tried to say, that, if $C(l)$ is the number of sequences of $l$ consecutive $A$s bounded by $B$, then $C(l)$ is an exponential function. It may be wrong, I am not a mathematician and my background in statistics is very elementary. However, that is what I found making some numerical experiments. I am wrong? – user1420303 Sep 06 '21 at 15:10
  • This sounded like gambler's fallacy: In long sequence, the ratio of As and Bs will reach to 0.5 but their difference will increase "exponential" like. See https://stats.stackexchange.com/questions/204397/regression-to-the-mean-vs-gamblers-fallacy/204417 – msuzen Sep 06 '21 at 15:32

1 Answers1

2

The distribution of the number of consecutive A's in each run of A's has a geometric distribution.

https://en.wikipedia.org/wiki/Geometric_distribution

(However, note that the parameter $p$ there is equivalent to your $1-p$.)

It is a given there's at least one $A$ (since that's the condition for you to count a run at all), and then the probability of each additional $A$ is $p$ times the probability of the next smaller run.

That is, if $X$ is the length of a run, $P(X=t) = p\cdot P(X=t-1)$.

Hence $P(X=t)=P(X=1)\cdot p^{t-1}$. Consequently, the total probability = $P(X=1)\cdot \frac{1}{1-p}$ (from the sum of a geometric series), so $P(X=1) = 1-p$.

Hence the probability that the run-length is $t$ is $(1-p)\, p^{t-1}, \quad t=1,2,...$

The shape of the p.m.f. is indeed a discrete version of an exponential curve.

Glen_b
  • 257,508
  • 32
  • 553
  • 939