0

I'm sorry if my style of formulation is a bit sloppy

Let us assume that we have an abstract urn with balls having $S$ different colors.
The probbility that the color $i$ ranging from $1..S$ has $n$ balls is $P(N=n)$, where $P(N=n)\tilde{}Log(p)$ in my case. Thus, in the mean, the number of colors having $n$ balls is simply $S*P(N=n)$.

The experiment:
A urn is created by making a concrete realization of the previous distribution. We then count the number of balls of each color and sort the colors by frequency.

The question: What is the mean (normalized) frequency of the most frequent (second most frequent and so on) color?

Any help is appreciated :)

  • Hint: order statistics. I have to admit though, I don't know what your $P(N = n)$ is supposed to be, given that you haven't told us what $p$ is. You need to make clearer what the multivariate probability distribution is governing the number of balls of various colors. – Mark L. Stone May 09 '16 at 18:40
  • $P(N=n)$ is the probability that a certain color has $n$ balls. They are independent. Thus the multivariate probability distribution is just the product. $p$ is the parameter of the logarithmic distribution. – Sebastian Lehmann May 09 '16 at 18:51
  • Do you mean "logarithmic distribution" as in https://en.wikipedia.org/wiki/Logarithmic_distribution? If not, then what? If so, then do you know $p$ or not? Also, since urns are always used as physical metaphors for sampling procedures, it is very strange that you are not sampling from this urn once it is filled. Why introduce an urn at all, then? It seems like a meaningless (and therefore confusing) abstraction. – whuber May 09 '16 at 20:51
  • Yes it is the "logarithmic distribution" as in wiki and yes i know $p$ beforehand. Currently I create one urn and sample from it. The problem is, that every time I start my program the "rank distribution" of the colors is different and a larger simulation depends on this distribution. Therefore, I want to directly sample from the mean rank distribution. The order statistics are the right way to calculate the mean, but this introduces very large combinatorical numbers which are impractical in my program. Another possibility is creating a set of urns and calculate the mean beforehand. – Sebastian Lehmann May 10 '16 at 04:36

0 Answers0