Set-up
I was simulating the Generalized Chinese Restaurant Process as shown on the wikipedia page [link] with a discount, $\alpha$, and concentration parameter $\theta$
For $n=5$ total customers being seated with $\alpha=.3$ and $\theta=1.5$ and 100,000 simulations I saw a distribution of the number of tables needed as
- 1 | 5.6%
- 2 | 20.2%
- 3 | 33.5%
- 4 | 29.3%
- 5 | 11.4%
with expected value of 3.207 tables, which matches the theoretical answer (link) for these parameters.
Question
Does there exist a formula to determine the probability of different frequency distributions among those tables?
For example, within the simulation, the following frequency distributions appeared with the following probabilities, with notation being $n_k$ represents $n$ tables with $k$ people; and $\sum(k n_k)=n$
- [0,0,0,0,1] 5.6%
- [0,1,1,0,0] 6.9%
- [1,0,0,1,0] 13.3%
- [1,2,0,0,0] 12.8%
- [2,0,1,0,0] 20.7%
- [3,1,0,0,0] 29.3%
- [5,0,0,0,0] 11.4%
(29.3% of the time the frequency distribution was 3 tables with 1 person, and 1 table with two; whereas only 5.6% of the time one table had all five people)
Notice that the 33.5% probability observed for having three tables match the sum of the individual frequency distribution which also have three tables (12.8% + 20.7%). Likewise for other numbers.
Reason
The reason for asking the question is say I know the frequency distribution of the number of tables that contain $k$ people, for a large number of customers - say, 1,000 customers, and they were:
$n_k=\{500,100,40,30,12\}$
(500 tables have 1 person, 100 have two people, 40 have three, etc)
Is there a way to determine the discount $(\alpha$) and concentration parameter $(\theta)$ which would yield the maximum likelihood of producing that known observed outcome?