1

I have a population of entities associated with different categories, say

blue    50000
red     300
green   80
yellow  10
pink    6
orange  3
white   2

The distribution is known up the exact counts. It is very homogeneous, i.e. if ${p_1,...,p_k}$ denotes the probabilities for each category, then $max_i(p_i)\geq0.95$.

Now I would like to choose a sample size, such that the expected number of categories $N_k$ present in the sample is $n_k$, i.e. the expected number of categories whose count is larger than zero is $n_k$. How, given a count vector as shown above and $n_k$, do I choose the sample size?

barbaz
  • 111
  • 3
  • possible duplicate of [How often do you have to roll a 6-sided dice to obtain every number at least once?](http://stats.stackexchange.com/questions/48396/how-often-do-you-have-to-roll-a-6-sided-dice-to-obtain-every-number-at-least-onc) – Sycorax Mar 12 '14 at 14:51

0 Answers0