How can I calculate a sample size for a ranked list of items across a population?

Question

I have a population - in this case they are phone calls into a call center.

Each call can be one of approximately 200 'Problems'. (an example of a Problem is 'I cannot connect to the internet').

I would like to predict the top 10 most common problems across the entire population based on analysis of a sample set of phone calls. How many phone calls do I need to categorize in order to generate this list with different confidence levels and population sizes?

Ideally I would like to find a formula which allows me to derive sample size from population size, problem set size and confidence level.

Disclosure: - I am a programmer, not a mathematician - I hope I have asked this question in the correct manner! - I have done reading prior to asking this question, most content I can find is about estimating p(x) in a population, my problem has more dimensions. Sorry if this is a duplicate and I am just not smart enough to realize!

No need to apologize! I have flagged the qn to the moderator for migrating to stats.SE. — , Dec 16 '11 at 17:00
Perhaps, my [answer](http://stats.stackexchange.com/a/19142/7199) to "[Sample size for a variable number of answers](http://stats.stackexchange.com/q/19120/7199)" is relevant. — varty, Dec 18 '11 at 03:27

score 1 · Answer 1 · answered Dec 17 '11 at 13:31

The Chernoff Bound looks to be out of my league, and for all I know it may be out of yours. A more manageable method would use chi-square tests or tests of the difference between proportions. If you specified the size of a difference you wanted to test, and if you specified the confidence level at which you wanted to get results, you could use various software packages (including the open-source GPower) or online power calculators to accomplish your goal of estimating the needed sample size.

Example: is item 1's % statistically significantly different from item 2's? You have %'s of 20 and 9, respectively. You want to see if they are significantly different at the .05 level (95% confidence). You plug those numbers into the calculator and you'll obtain a required sample size that at least applies for those who choose items 1 or 2.

Now here's the part that purists are likely to question: you repeat the process to compare item 2 with item 3, and so on. The fact that you'll be doing multiple tests, each partly dependent on the previous, makes this a rough workaround rather than an ideal method. But it may be good enough for government work.

score 0 · Answer 2 · answered Dec 16 '11 at 17:22

0

I think you can use the Chernoff Bound

answered Dec 16 '11 at 17:22

Nick

3,327
6
28
24

How can I calculate a sample size for a ranked list of items across a population?

2 Answers2