3

I have a population - in this case they are phone calls into a call center.

Each call can be one of approximately 200 'Problems'. (an example of a Problem is 'I cannot connect to the internet').

I would like to predict the top 10 most common problems across the entire population based on analysis of a sample set of phone calls. How many phone calls do I need to categorize in order to generate this list with different confidence levels and population sizes?

Ideally I would like to find a formula which allows me to derive sample size from population size, problem set size and confidence level.

Disclosure: - I am a programmer, not a mathematician - I hope I have asked this question in the correct manner! - I have done reading prior to asking this question, most content I can find is about estimating p(x) in a population, my problem has more dimensions. Sorry if this is a duplicate and I am just not smart enough to realize!

  • apologies - I did not even know that SE existed. –  Dec 16 '11 at 16:59
  • No need to apologize! I have flagged the qn to the moderator for migrating to stats.SE. –  Dec 16 '11 at 17:00
  • Perhaps, my [answer](http://stats.stackexchange.com/a/19142/7199) to "[Sample size for a variable number of answers](http://stats.stackexchange.com/q/19120/7199)" is relevant. – varty Dec 18 '11 at 03:27

2 Answers2

1

The Chernoff Bound looks to be out of my league, and for all I know it may be out of yours. A more manageable method would use chi-square tests or tests of the difference between proportions. If you specified the size of a difference you wanted to test, and if you specified the confidence level at which you wanted to get results, you could use various software packages (including the open-source GPower) or online power calculators to accomplish your goal of estimating the needed sample size.

Example: is item 1's % statistically significantly different from item 2's? You have %'s of 20 and 9, respectively. You want to see if they are significantly different at the .05 level (95% confidence). You plug those numbers into the calculator and you'll obtain a required sample size that at least applies for those who choose items 1 or 2.

Now here's the part that purists are likely to question: you repeat the process to compare item 2 with item 3, and so on. The fact that you'll be doing multiple tests, each partly dependent on the previous, makes this a rough workaround rather than an ideal method. But it may be good enough for government work.

rolando2
  • 11,645
  • 1
  • 39
  • 60
0

I think you can use the Chernoff Bound

Nick
  • 3,327
  • 6
  • 28
  • 24