Let's say I have a randomly generated sequence consisting of letters A, C, T and G that's 1000 letters long. The probability of each letter occurring is 25%. What is the probability that the sequence 'AAAAA' will occur N times within the 1000-letter sequence?
The problem I have solving this is that the trials are dependent, otherwise this would be modeled nicely using the binomial/Poisson distributions. But if the sequence 'AAAAA' happens to occur at position X, then the probability of it occurring at position X + 1 is 0.25 and not 0.25 ^ 5.
Thank you.