Say I have a binary sequence of length N (e.g. HTHHTHTTTTTTH). How do I find the number of ways to get a NON-overlapping substring of length K (e.g. HHTH), where 1<=K<=N?
It's possible you can multiple repeats of the substring if it's short enough compared to the full string length N. Call the number of repeats M.
One obvious piece of information is the max number of possible repeats (an upper bound on M), is floor(N/K), since e.g. for N=7, K=3, you can only have up to 2 simultaneous occurrences of the substring (2*3 <= 7).
So what I'm looking for is f(N,K,M) such that f gives me the number of ways I can get my length K substring repeated M times in my length N string.
I imagine someone proficient in bioinformatics should be able to solve this pretty easily, as the analogous question for a DNA sequence is a superset of this problem (DNA has A,C,G,T bases instead of just heads/tails, and the basepairs have different frequencies) so it's a more general problem. In fact, if someone wanted to simply derive the analogous results for DNA sequencing, where we have a categorical distribution, that would be acceptable too because it is more general (see e.g. https://en.wikipedia.org/wiki/Categorical_distribution)