Count the number of occurrences of a specified string within a random text.
Statistical problems involving "string-counts" occur when we count the number of occurrences of a specified string (a vector of symbols) within a text (another vector of symbols). Any distribution for an underlying random text induces a corresponding probability distribution for the string-count, and the properties of this distribution are often of interest in statistical problems. Probabilties pertaining to string-counts are of interest in the analysis of DNA code, and in other problems where we wish to determine whether a given string has occurred more than would be expected "at random".
The distribution of the string-count for a random text is closely related to a number of mathematical and statistical subjects, including Markov chains, deterministic finite automata (DFAs), probability generating functions, and recursive computation formulae. Statistical models for string-counts may use finite-order Markov chains for the symbols in the text, or they may involve simpler models (e.g., models where symbols in the text are IID random variables).
The string-count
tag is suitable for any problem involving analysis of the number of occurrences of a string of symbols within a random text. It is also suitable for related problems where we look at the number of symbols in a text that are needed until the occurrence of a string. Note that the "string-count" is closely related to "runs" of symbols in a text.