3

Set-up

I was simulating the Generalized Chinese Restaurant Process as shown on the wikipedia page [link] with a discount, $\alpha$, and concentration parameter $\theta$

For $n=5$ total customers being seated with $\alpha=.3$ and $\theta=1.5$ and 100,000 simulations I saw a distribution of the number of tables needed as

  • 1 | 5.6%
  • 2 | 20.2%
  • 3 | 33.5%
  • 4 | 29.3%
  • 5 | 11.4%

with expected value of 3.207 tables, which matches the theoretical answer (link) for these parameters.

Question

Does there exist a formula to determine the probability of different frequency distributions among those tables?

For example, within the simulation, the following frequency distributions appeared with the following probabilities, with notation being $n_k$ represents $n$ tables with $k$ people; and $\sum(k n_k)=n$

  • [0,0,0,0,1] 5.6%
  • [0,1,1,0,0] 6.9%
  • [1,0,0,1,0] 13.3%
  • [1,2,0,0,0] 12.8%
  • [2,0,1,0,0] 20.7%
  • [3,1,0,0,0] 29.3%
  • [5,0,0,0,0] 11.4%

(29.3% of the time the frequency distribution was 3 tables with 1 person, and 1 table with two; whereas only 5.6% of the time one table had all five people)

Notice that the 33.5% probability observed for having three tables match the sum of the individual frequency distribution which also have three tables (12.8% + 20.7%). Likewise for other numbers.

Reason

The reason for asking the question is say I know the frequency distribution of the number of tables that contain $k$ people, for a large number of customers - say, 1,000 customers, and they were:

$n_k=\{500,100,40,30,12\}$

(500 tables have 1 person, 100 have two people, 40 have three, etc)

Is there a way to determine the discount $(\alpha$) and concentration parameter $(\theta)$ which would yield the maximum likelihood of producing that known observed outcome?

sheppa28
  • 1,287
  • 8
  • 15

2 Answers2

2

Your data {500,100,40,30,12} implies that 500+100+40+30+12 customers chose to sit at a new table, 100+40+30+12 customers chose to sit at a table that had exactly 1 customer at the time, 40+30+12 chose to sit at a table that had exactly 2 customers at the time, etc.

Suppose you also knew the ordering of these 1000 customers. Given $\alpha$ and $\theta$, the probability of their sequence of 1000 choices would be easy to work out -- it is a product of the 1000 independent (but not identical) probabilities of their individual decisions. Those probabilities are just as in your simulation.

But that is just the likelihood of $\alpha,\theta$, which you can maximize by gradient ascent on the log-likelihood.

Now, you don't actually know the ordering, so you have to sum over possible orderings. But as it turns out, the probability above doesn't depend on the ordering -- that's what it means to say that the process is exchangeable. So just pick any convenient ordering and compute its probability. (For example, assume that the customers that start a new table show up first, and then the customers that sit at a table with exactly 1 previous customer, etc. You should be able to verify that changing the ordering wouldn't change the probability.)

This is the probability of a single ordering. To get your actual likelihood, you'd need to multiply by the number of possible orderings. But that number is a constant that depends only on your data and not on $\alpha,\theta$. So you can ignore it for the purposes of finding the likelihood maximizer. In other words, just find the $\alpha,\theta$ that maximize the probability of the single ordering.

0

For the original CRP, the number of tables $m$ used after N draws is distributed with pdf:

$$P(M=m | \alpha, N)=s(N, m)\alpha^m \frac{\Gamma(\alpha)}{\Gamma(\alpha+N)}$$

where $s(N,m)$ are stirling numbers of the first kind. This is explained here. Perhaps the generalized CRP is a generalization of this?

Tim Hopper
  • 101
  • 1