I have a sequence of characters in a long string like this
'ATCGCGCGCGATCGACGCGTACGTCGGATCTA.....'
And I know that for example the substring 'ATCG' has been repeated X times in this string, How could I statistically compute if this number is significantly different from the random? The random expectation of substring could be calculated by computing the frequencies of each character but I am a bit confused whether I should use a chi square test or binomial test or some other test to calculate the statistical significance of difference between observed and expected. If chi square tests how I should calculate the degree of freedom and if binomial how should I calculate the value of 'n' in the binomail formula?
binom.test(x, n, p = 0.5, alternative = c("two.sided", "less", "greater"),
conf.level = 0.95)
I appreciate any hint