I have a population - in this case they are phone calls into a call center.
Each call can be one of approximately 200 'Problems'. (an example of a Problem is 'I cannot connect to the internet').
I would like to predict the top 10 most common problems across the entire population based on analysis of a sample set of phone calls. How many phone calls do I need to categorize in order to generate this list with different confidence levels and population sizes?
Ideally I would like to find a formula which allows me to derive sample size from population size, problem set size and confidence level.
Disclosure: - I am a programmer, not a mathematician - I hope I have asked this question in the correct manner! - I have done reading prior to asking this question, most content I can find is about estimating p(x) in a population, my problem has more dimensions. Sorry if this is a duplicate and I am just not smart enough to realize!