I'm interested in calculating the log probability of data drawn from a Dirichlet distribution. In particular, I'm interested in calculations that are stable in high dimensions, perhaps 1000 dimensions or more.
I've tried to use the "dirichlet_logProb.m" function from Tom Minka's fastfit toolbox: http://research.microsoft.com/en-us/um/people/minka/software/fastfit/. However, in high dimensions, this seems to give unexpected large positive density values to every data vector I provide.
Certainly, when a distribution is sharply peaked, we can have arbitrarily large (greater than one) density values at some points in the support. For example, think of a zero-mean Normal distribution with very small variance... it will give very large density (>>1) to x close to zero, but vanishingly small density (<<1) to x more than a few standard deviations away.
The same should be true of any distribution: we should always be able to find points that have large density and small density, since the density needs to integrate to one over the support. However, in high dimensions I cannot get my Dirichlet log probability calculations to work out this way... every single point I try has density greater than one.
Here's a sketch of my experiment: I consider a symmetric Dirichlet with concentration parameter <1. This should prefer data that are sparse, where only some dimensions have significant probability mass, and all others are near zero. I then look at three possible draws, which I illustrate here using K=4 dimensions.
- Extremely sparse: x = [1 0 0 0]
- Halfway uniform: x = [0.5 0.5 0 0]
- Uniform: x = [0.25 0.25 0.25 0.25]
Given concentration parameter lambda < 1, we should expect 1 to have much larger density than 2, and 2 to have much larger density than 3. Since the uniform is basically the opposite of sparsity, we should also expect this outcome to have negligible (less than one) density... which corresponds to negative log probability.
Here is a result of the computed log probabilities for these three outcomes, across many values of the dimension K, with concentration=0.01. Remember, these are log probabilities, so positive values imply density > 1, and negative values density < 1.
[1 0...0] [unif 0..0] [ unif ] K= 4 9.18e+01 5.59e+01 -9.71e+00 K= 10 2.77e+02 1.43e+02 -2.09e+01 K= 50 1.52e+03 7.85e+02 -3.58e+01 K= 100 3.07e+03 1.64e+03 -4.04e+00 K= 500 1.55e+04 8.98e+03 7.80e+02 K=1000 3.11e+04 1.87e+04 2.25e+03
You can reproduce this table with this code: http://gist.github.com/3699842
The problem, of course, is that with K=500 and K=1000 all of the outcomes 1,2,3 have very positive log probability (density >> 1). This seems troublesome to me, since I'd expect the most unlikely outcome (the uniform) to have density less than one... am I mistaken somehow?
Can anybody suggest an answer? Is this a numerical issue? Or do I misunderstand something about the Dirichlet? My prime suspect is the fact that the support region is a bit funny (a simplex, rather than the reals).