I have a dataset that consists of an ordered series of categorical samples that are not distributed perfectly random; every sample has an elevated chance of being the same as the sample before it (i.e. there is a tendency for 'streaks' of the same value). I want to take a subset of this dataset (using a selection function, let's say S()) and then from this subset take a uniform sample of say 10% of all values in the original dataset. Now I want to know if the distribution of the result of this is the same as the distribution in the original data set; in other words, I want to measure if my selection function S() changes the distribution, or in other words still, if my resulting selection is an accurate representation of the whole population.
To get the sample size for a normally distributed quantity at a given confidence level and margin of error for the data set of a certain size, I'd normally use a 'sample size calculator' online and call it a day. But in the case described above, I'm not quite sure how all pieces fit together - does it matter that my data is categorical? Does the order of the samples matter? Is this an appropriate case for using the Chi-square test of indepence?
I think my question reduces to 'how can I compare the distribution of two series of ordered categorical values' but I'm not even sure if that question makes sense - hence my admittedly clumsy description above. Can anyone clarify how I could go about this?