To demonstrate a solution to this hyperprior problem, I implemented an hierarchical gamma-Dirichlet-multinomial model in PyMC3. The gamma prior for the Dirichlet is specified and sampled per Ted Dunning's blog post.
The model I implemented can be found at this Gist but is also described below:
This is a Bayesian hierarchical (pooling) model for movie ratings. Each movie can be rated on a scale from zero to five. Each movie is rated several times. We want to find a smoothed distribution of ratings for each movie.
We are going to learn a top-level prior distribution (hyperprior) on movie ratings from the data. Each movie will then have its own prior that is smoothed by this top-level prior. Another way of thinking about this is that the prior for ratings for each movie will be shrunk towards the group-level, or pooled, distribution.
If a movie has an atypical rating distribution, this approach will shrink the ratings to something more in-line with what is expected. Furthermore, this learned prior can be useful to bootstrap movies with few ratings to allow them to be meaningfully compared to movies with many ratings.
The model is as follows:
$\gamma_{k=1...K} \sim Gamma(\alpha, \beta)$
$\theta_{m=1...M} \sim Dirichlet_M(c\gamma_1, ..., c\gamma_K)$
$z_{m=1...M,n=1...N_m} \sim Categorical_M(\theta_m)$
where:
- $K$ number of movie rating levels (e.g. $K = 6$ implies ratings 0, ..., 5)
- $M$ number of movies being rated
- $N_m$ number of ratings for movie $m$
- $\alpha = 1 / K$ in order to make the collection of gamma r.v.s act as an exponential coefficient
- $\beta$ rate parameter for the exponential top-level prior
- $c$ concentration parameter dictating the strength of the top-level prior
- $\gamma_k$ top-level prior for rating level $k$
- $\theta_m$ movie-level prior for rating levels (multivariate with dimension = $K$)
- $z_{mn}$ rating $n$ for movie $m$