I was looking for a measure to interpret the "spikiness" of categorical histograms. So, if it becomes unnaturally skewed towards a certain value at a given time, I want a metric that will show some kind of a spike at that time. I considered a variety of metrics for this purpose and finally settled on the Entropy of a Dirichlet distribution (considering the histogram of counts as a sample from a Dirichlet and using the corresponding Entropy as my metric). For this, I used the formula for Entropy in the Wikipedia article . $$H = logB(\alpha) + (\alpha_0 - K)\psi(\alpha_0) - \sum(\alpha_i-1)\psi(\alpha_i) $$
Here, $\alpha$ is the vector of counts in various categorical bins of the histogram, $\alpha_0 = \sum\alpha_i$, $B$ is the multivariate beta function, $\psi$ is the digamma function. I implemented this in C# (implementation pasted below) and am having some problem with interpreting the results. I would expect an $\alpha$ with a flat distribution (uniformly spread across its categories) to have a higher entropy than one that is spiked towards a given category and this holds true. The gap in interpretation arises with various kinds of flat distributions. My expectation would be that an alpha described by the array {x,x,x} would have increasingly higher values of entropy as x increased. The reason for this belief is that a higher sample would mean we are increasingly sure that the distribution is flat and should increase the Entropy. What I see in practice is this -
x: 0.1, Entropy: -13.025
x: 0.6, Entropy: -4.82
x: 1.0, Entropy: -0.693
x: 1.6, Entropy: -0.8164
x: 2.1, Entropy: -0.967
As you can see, there seems to be a maxima at x=1. This goes against my intuition. Can anyone help me interpret this or let me know if these results are rubbish.
private static double regularizer = 0.1;
/// <summary>
/// Based on the formula for total information entropy given here - https://en.wikipedia.org/wiki/Dirichlet_distribution.
/// </summary>
/// <param name="alpha">alpha: The parameters of the Dirichlet distribution. These correspond to a histogram with counts.</param>
/// <returns>The Entropy of a Dirichlet distribution.</returns>
public double Entropy(double[] alpha)
{
_2_gammafamily g = new _2_gammafamily();
double alpha_0 = 0, H = 0;//The sum of coefficients (normalizing factor) and final entropy term respectively.
int K = alpha.Length;
for (int i = 0; i < K; i++)
{
alpha[i] += regularizer;//Before doing anything else, we regularize the parameters which is equivalent to a uniform prior.
alpha_0 += alpha[i];
H += g.Gammaln(alpha[i]);//Positive part of normalization constant (which is the log of a multivariate beta distribution).
H -= (alpha[i] - 1) * g.Digamma(alpha[i]); //The contribution from each of the alphas.
}
H -= g.Gammaln(alpha_0);//Negative part of normalization constant.
H += (alpha_0 - K) * g.Digamma(alpha_0);//The contribution from the normalizing factor.
return H;
}