3

How can be proven that the entropy of a die with equal probability for all its faces is at its maximum?

It's clear that the entropy will be smaller if there are more chances for a particular face, but how can be this proven?

Alexis
  • 26,219
  • 5
  • 78
  • 131
whynot
  • 153
  • 1
  • 5
  • You can see my answer [here](https://stats.stackexchange.com/a/383203/103153). I found this snippet from [this thesis](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.467.2499&rep=rep1&type=pdf#page=56) can exactly answer your question: > In information theory, entropy or uncertainty is generally identified > with quantity of information. To understand why this correspondence > makes sense, consider how the informational state changes when an > actual event occurs, if you already knew the underlying probability > distribution. In the case of the heavily biased coin, actual flips > tel – Lerner Zhang Dec 16 '18 at 06:27

1 Answers1

5

It is a direct consequence of the concavity of the function $-x \log(x)$ for arguments between $0$ and $1$.

The entropy of a die with $n$ sides and probabilities $p_1, p_2, \ldots, p_n$ is defined to be the sum of the $-p_i \log(p_i)$, which is a continuous function of the $p_i$ for all possible probability assignments (including possibly setting some of them to zero). Taking second derivatives gives a diagonal hessian with values $-1/p_i$, showing the function is everywhere concave. This immediately implies it has a unique critical point where none of the $p_i$ is zero and that it corresponds to a global maximum. But the entropy is a symmetric function of the $p_i$, whence it must have a critical point where all the $p_i$ are equal, QED.


Edit

There is a proof which is at once elementary and pretty. It uses two simple, well-known ideas:

  1. A function is optimized simultaneously with any monotonic re-expression of its values. In particular, the entropy $H = -\sum p_i \log(p_i)$ is maximized when $e^{-H} = \prod (p_i)^{p_i}$ is minimized.

  2. The (weighted) Geometric Mean-Harmonic Mean Inequality. Let $x_i$ be arbitrary positive numbers and $p_i$ be positive "weights" summing to $1$. The weighted geometric mean (GM) of the $x_i$ is $\prod x_i^{p_i}$. Similarly, the weighted harmonic mean (HM) of the $x_i$ is the reciprocal of $\sum p_i(1/x_i)$. The GM-HM Inequality asserts that $GM \ge HM$ and that equality holds if and only if all the $x_i$ are equal to each other. There are many elementary proofs of this. (A good account of the weighted version of the GM-HM Inequality is difficult to find on the Web, although it is well covered in various texts. See the top of page 5 in Bjorn Poonen's notes on inequalities, for instance.)

Looking at #1, we recognize $e^{-H}$ as the GM of the $x_i$ with weights $p_i$ where $x_i=p_i$. From the GM-HM Inequality, this value is never less than the HM, which is the reciprocal of $\sum_i p_i \left(1/x_i\right)$ = $\sum_i p_i/p_i$ = $\sum_i 1 = n$. Also, the GM and HM are equal to each other (and therefore equal to $1/n$) if and only if all the $p_i$ are equal. It is immediate that $H$ is maximized when the $p_i$ are equal and will have the maximum value $\log(n)$.

This argument covers all but the cases where some of the $p_i$ may be zero. But in such cases, where there are $n' \lt n$ nonzero $p_i$, the foregoing shows that $H$ cannot exceed $n'$, whence it is not possible to maximize the entropy by setting any of the probabilities to zero.

whuber
  • 281,159
  • 54
  • 637
  • 1,101