1

I have a string of length n, composed of 20 characters of equal probability. What is chance of occurrence of a regular expression pattern, like 'WP[^WFHY]{5}W' by chance? In case you are not familiar with python, [^WFHY]{5} means any 5 characters that are not W, F, H or Y.

Furthermore, if I have a database of 17000 sequences. How do I calculate the same probability given that the length of the string varies between each sequence? I assume we we can't concatenate all the sequences as in the first calculation because matches can't occur between sequences.

Lastly, what if the frequency / probability of character occurrence is not equal? How to calculate the frequency / probability of each letter occurrence and factor it into the calculation?

Characters:

A
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
Y
Ben
  • 91,027
  • 3
  • 150
  • 376
  • This is a generalization of [Penney's Game](https://stats.stackexchange.com/questions/12174) and can be solved using the techniques described in the linked thread. – whuber Oct 04 '19 at 15:17
  • How should this abandoned question be resolved? Should it be closed as a duplicate? – mickmackusa Nov 01 '21 at 09:05

0 Answers0